GenCore version 5.1.6 
Copyright (c) 1993 ~ 2004 Compugen Ltd. 



OM protein - protein search, using sw model 

Run on: March 26, 2004, 15:57:50 ; Search time 36.6674 Seconds 

(without alignments) 
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Title: 

Perfect score: 
Sequence : 



US-10-092-390-4 
3601 

1 MVISLNSCLSFICLLLCHWI HCDSVCAEGRWGPNCSLPCY 586 



Scoring table: BLOSUM62 

Gapop 10.0 , Gapext 0.5 

Searched: 1586107 seqs, 282547505 residues 

Total number of hits satisfying chosen parameters: 



1586107 



Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 10 0% 
Listing first 45 summaries 
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A_Geneseq_29Jan04 : * 

1: geneseqpl98 0s : * 

2 : geneseqpl990s : * 

3 : geneseqp2000s : * 

4 : geneseqp2001s : * 

5: geneseqp2002s : * 

6: geneseqp2003as : * 

7 : geneseqp2003bs : * 

8 : geneseqp2004s : * 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 
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ALIGNMENTS 



RESULT 1 
AAE27986 

ID AAE27986 standard; protein; 586 AA. 
XX 

AC AAE27986; 
XX 

DT 27-JAN-2003 (first entry) 
XX 

DE Human EGF-family protein #2. 
XX 

KW Human; EGF-family protein; novel human protein; NHP; drug ^ discovery ; 

KW restriction fragment length polymorphism analysis; forensic biology; 

KW toxicity; infectious disease; biological disorder; medical disorder; 

KW mental disorder; gene therapy. 
XX 

OS Homo sapiens. 



XX 

PN WO200272611-A2 . 
XX 

PD 19-SEP-2002 . 
XX 

PF 06-MAR-2002; 2 0 02WO-US 0 07 4 7 7 . 
XX 

PR 12-MAR-2001; 2 001US-027 5013P . 
XX 

PA (LEXI-) LEXICON GENETICS INC. 
XX 

PI Yu X, Miranda M; 
XX 

DR WPI; 2002-723315/78. 

DR N-PSDB; AAD46319. 
XX 

PT New novel human nucleic acids useful for e.g. identifying protein coding 

PT sequences and mapping unique genes to a particular chromosome, as DNA 

PT markers for restriction fragment length polymorphism analysis, or in 

PT forensic biology. 
XX 

PS Claim 2; Page 40-42; 42pp; English. 
XX 

CC The present sequence is EGF-family protein, a novel human protein (NHP) . 

CC The NHP sequences are useful for mapping unique genes to a particular 

CC chromosome; as DNA markers for restriction fragment length polymorphism 

CC analysis; in forensic biology; in defining and monitoring both drug 

CC action and toxicity; in identifying, selecting and validating novel 

CC molecular targets for drug discovery; in microarrays or other assay 

CC formats to screen collections of genetic material from patients who have 

CC a particular medical condition. The NHP peptides, fusion proteins, 

CC antibodies, antagonists and agonists can be used for detecting mutant 

CC NHPs or inappropriately expressed NHPs for the diagnosis of disease; for 

CC screening drugs for treatment of symptomatic or phenotypic manifestations 

CC of perturbing the normal function of NHP in the body and to treat 

CC diseases including infectious, mental, biological, or medical diseases or 

CC disorders. They are also used in gene therapy 

XX 

SQ Sequence 586 AA; 

Query Match 100.0%; Score 3601; DB 5; Length 586; 

Best Local Similarity 100.0%; Pred. No. 2.2e-158; 

Matches 586; Conservative 0; Mismatches 0; Indels 0; Gaps 0 
MVI SLNSCLSFICLLLCHWIGTASPLNLEDPNVCSHWESYSVTVQESYPHPFDQI YYTSC 60 

I | | | | | | | | | | | I II II I I I I II I I I I I M M I I I I I I I I I M I I I I I I I M I M I I I I I 

MVI SLNSCLSFICLLLCHWIGTASPLNLEDPNVCSHWESYSVTVQESYPHPFDQI YYTSC 60 
T D I LNW FKCT RH RVS YRTAY RHGE KTMYRRK S QCCPGFYES GEMC VP H C AD KC VH GRC I A 12 0 

I | || | | | | | | I I I I I I I I I I I I II I I I I I M I I M I I I M I I I I I I I I 

TDILNWFKCTRHRVSYRTAYRHGEKTMYRRKSQCCPGFYESGEMCVPHCADKCVHGRCIA 12 0 

PNTCQCEPGWGGTNCSSACDGDHWGPHCTSRCQCKNGALCNPITGACHCAAGFRGWRCED 18 0 
| | | | | M | | | | 1 | | I I I I I I I I I I II I I I I I I I I I I I I I M II I I I I I I I I ! I M I M I I 
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l 


Db 
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61 


Db 
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Qy 


121 


Db 


121 



Qy 



181 RCEQGTYGNDCHQRCQCQNGATCDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPC 24 0 



181 RCEQGTYGNDCHQRCQCQNGATCDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPC 240 

241 QNGGVCHHVTGECSCPSGWMGTVCGQPCPEGRFGKNCSQECQCHNGGTCDAATGQCHCSP 3 00 

I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I 1 I I I I I I I I 1 M 

241 QNGGVCHHVTGECSCPSGWMGTVCGQPCPEGRFGKNCSQECQCHNGGTCDAATGQCHCSP 300 

301 GYTGERCQDECPVGTYGVLCAETCQCWGGKCYHVSGACLCEAGFAGERCEARLCPEGLY 360 

I | | | | | | | | M | I I I I I I I I I I I II I II II I I I Ml I I I I M I I I I M I I I II II I II I I 

301 GYTGERCQDECPVGTYGVLCAETCQCVNGGKCYHVSGACLCEAGFAGERCEARLCPEGLY 360 

361 GIKCDKRCPCHLENTHSCHPMSGECACKPGWSGLYCNETCSPGFYGEACQQICSCQNGAD 42 0 

I I I II I I I I I I II I II I M I I I I I I M I I I I I I 

361 GIKCDKRCPCHLENTHSCHPMSGECACKPGWSGLYCNETCSPGFYGEACQQICSCQNGAD 420 

421 CDSVTGKCTCAPGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVCSPVDGSCTCKAGWHGV 480 

| | | I I I I I M I I I I I I I I I I I I I I I I I I I I M I I II I I I I I I I I I I I I II I I I I I I I I I I 
421 CDSVTGKCTCAPGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVCSPVDGSCTCKAGWHGV 48 0 

481 DCS I RCPSGTWGFGCNLTCQCLNGGACNTLDGTCTCAPGWRGEKCELPCQDGT YGLNCAE 54 0 

| | M I I I I I I M I M M I I II I I II II I I I I I I II I I I I I I I I I I I M I 

481 DCS I RCPSGTWGFGCNLTCQCLNGGACNTLDGTCTCAPGWRGEKCELPCQDGT YGLNCAE 54 0 
541 RCDCSHADGCHPTTGHCRCLPGWSGVHCDSVCAEGRWGPNCSLPCY 586 

M | | | | I II I II I I I I I I I I II I I I M I I M I I II I I I I I I I I I I I 

541 RCDCSHADGCHPTTGHCRCLPGWSGVHCDSVCAEGRWGPNCSLPCY 58 6 



RESULT 2 




AAE27985 




ID 


AAE27985 standard; protein; 1140 AA. 




XX 






AC 


AAE27985; 




XX 






DT 


27-JAN-2003 (first entry) 




XX 






DE 


Human EGF-family protein #1. 




XX 




; drug discovery; 


KW 


Human; EGF-family protein; novel human protein; NHP 


KW 


restriction fragment length polymorphism analysis; 


forensic biology; 


KW 


toxicity; infectious disease; biological disorders- 


medical disorder; 


KW 


mental disorder; gene therapy. 




XX 






OS 


Homo sapiens. 




XX 






PN 


WO200272611-A2. 




XX 






PD 


19-SEP-2002 . 




XX 






PF 


06-MAR-2002; 2 002WO-US0 07 4 7 7 . 




XX 






PR 


12-MAR-2001; 2 00 1US- 027 50 13P . 




XX 






PA 


(LEXI-) LEXICON GENETICS INC. 




XX 






PI 


Yu X, Miranda M; 




XX 
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Qy 

Db 

QY 
Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



DR WPI; 2002-723315/78. 

DR N-PSDB; AAD46318. 
XX 

PT New novel human nucleic acids useful for e.g. identifying protein coding 

PT sequences and mapping unique genes to a particular chromosome, as DNA 

PT markers for restriction fragment length polymorphism analysis, or in 

PT forensic biology. 
XX 

PS Claim 2; Page 37-40; 42pp; English. 
XX 

CC The present sequence is EGF-family protein, a novel human protein (NHP) . 

CC The NHP sequences are useful for mapping unique genes to a particular 

CC chromosome; as DNA markers for restriction fragment length polymorphism 

CC analysis; in forensic biology; in defining and monitoring both drug 

CC action and toxicity; in identifying, selecting and validating novel 

CC molecular targets for drug discovery; in microarrays or other assay 

CC formats to screen collections of genetic material from patients who have 

CC a particular medical condition. The NHP peptides, fusion proteins, 

CC antibodies, antagonists and agonists can be used for detecting mutant 

CC NHPs or inappropriately expressed NHPs for the diagnosis of disease; for 

CC screening drugs for treatment of symptomatic or phenotypic manifestations 

CC of perturbing the normal function of NHP in the body and to treat 

CC diseases including infectious, mental, biological, or medical diseases or 

CC disorders. They are also used in gene therapy 

XX 

SQ Sequence 1140 AA; 

Query Match 100.0%; Score 3601; DB 5; Length 1140; 

Best Local Similarity 100.0%; Pred. No. 3.6e-158; 

Matches 586; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 



Qy 


i 


MVISLNSCLSFICLLLCHW1GTASPLNLEDPNVCSHWESYSVTVQESYPHPFDQI YYTSC 

| | | | | | | | | | | | | I I I I I II 1 i 1 1 1 I 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

MVISLNSCLSFICLLLCHWIGTASPLNLEDPNVCSHWESYSVTVQESYPHPFDQI YYTSC 


60 


Db 


l 


60 


Qy 


61 


TDILNWFKCTRHRVSYRTAYRHGEKTMYRRKSQCCPGFYESGEMCVPHCADKCVHGRCIA 

M 1 1 1 1 1 1 1 1 1 1 II MINIMI M M 1 II 1 1 M 1 M 1 1 1 1 II 1 1 1 II 

TDILNWFKCTRHRVSYRTAYRHGEKTMYRRKSQCCPGFYESGEMCVPHCADKCVHGRCIA 


120 


Db 


61 


120 


Qy 


121 


PNTCQCEPGWGGTNCSSACDGDHWGPHCTSRCQCKNGALCNPITGACHCAAGFRGWRCED 

|| || || M II 1 II II 1 M 1 II 1 II 1 II 1 1 II 1 II 1 1 1 II 1 M 1 II 1 1 

PNTCQCEPGWGGTNCSSACDGDHWGPHCTSRCQCKNGALCNPITGACHCAAGFRGWRCED 


180 


Db 


121 


180 


Qy 


181 


RCEQGTYGNDCHQRCQCQNGATCDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPC 

M || || II II II M II II II II II M 1 1 1 1 1 1 1 M 1 II II 1 1 M II 1 II 1 M 1 II 1 1 1 1 I 

RCEQGTYGNDCHQRCQCQNGATCDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPC 


240 


Db 


181 


240 


Qy 


241 


QNGGVCHHVTGECSCPSGWMGTVCGQPCPEGRFGKNCSQECQCHNGGTCDAATGQCHCSP 

| | | || M II II 1 M II M M 1 1 1 II 1 1 II 1 M II 1 1 II 1 1 II M II II 1 II 1 1 1 

QNGGVCHHVTGECSCPSGWMGTVCGQPCPEGRFGKNCSQECQCHNGGTCDAATGQCHCSP 


300 


Db 


241 


300 


Qy 


301 


GYTGERCQDECPVGTYGVLCAETCQCVNGGKCYHVSGACLCEAGFAGERCEARLCPEGLY 

II II M II II II 1 M 1 1 1 II 1 1 M II II II II 1 1 M M 1 M 1 M 

GYTGERCQDECPVGTYGVLCAETCQCVNGGKCYHVSGACLCEAGFAGERCEARLCPEGLY 


360 


Db 


301 


360 


Qy 


361 


GIKCDKRCPCHLENTHSCHPMSGECACKPGWSGLYCNETCSPGFYGEACQQICSCQNGAD 

I || 1 II II 1 II II M 1 II II II II II 1 1 1 1 1 1 1 M 1 1 1 II II II 1 II II Ml 


420 



Db 361 GIKCDKRCPCHLENTHSCHPMSGECACKPGWSGLYCNETCSPGFYGEACQQICSCQNGAD 420 

421 CDSVTGKCTCAPGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVCSPVDGSCTCKAGWHGV 480 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I II 

Db 421 CDSVTGKCTCAPGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVCSPVDGSCTCKAGWHGV 480 

0v 481 DCSIRCPSGTWGFGCNLTCQCLNGGACNTLDGTCTCAPGWRGEKCELPCQDGTYGLNCAE 54 0 

| | | || | | | | | | | | II I I I II I I I I I I I I II I I I I I I II I I I II I I I I M 

Db 4 81 DCSIRCPSGTWGFGCNLTCQCLNGGACNTLDGTCTCAPGWRGEKCELPCQDGTYGLNCAE 54 0 

Qy 541 RCDCSHADGCHPTTGHCRCLPGWSGVHCDSVCAEGRWGPNCSLPCY 58 6 

I I M I I I M M I I M I I I I II I I I I I I I I I I I I M M I I I I I I M I 

Db 541 RCDCSHADGCHPTTGHCRCLPGWSGVHCDSVCAEGRWGPNCSLPCY 58 6 



RESULT 3 
ADD18688 

ID ADD18688 standard; protein; 1140 AA. 
XX 

AC ADD18 68 8; 
XX 

DT 15-JAN-2004 (first entry) 
XX 

DE Human disease related protein SeqID119. 
XX 
KW 



human; disease state; cytostatic; antiinflammatory; ophthalmological ; 



KW antiarteriosclerotic; vulnerary; gene therapy; 

KW hypoxia-regulated condition; tumourigenesis ; angiogenesis ; apoptosis; 

KW inflammation; erythropoiesis ; glycolysis; gluconeogenesis ; 

KW glucose transportation; catecholamine synthesis; iron transport; 

KW nitric oxide synthesis; cancer; ischaemic condition; reperfusion injury; 

KW retinopathy; neonatal stress; pre-eclampsia ; atherosclerosis; 

KW inflammatory condition; wound healing. 
XX 

OS Homo sapiens. 
XX 

PN WO2003018621-A2 . 
XX 

PD 06-MAR-2003. 
XX 

PF 23-AUG-2002; 2 002WO-GB0038 92 . 
XX 

PR 23-AUG-2001; 2 001GB-0002 0558 . 

PR 05-OCT-2001; 2001GB-00024 037 . 
XX 

PA (OXFO-) OXFORD BIOMEDICA UK LTD. 
XX 

PI Kingsman SM, White J, Ward NR, Harris RA, Naylor S, Mundy CR; 
XX 

DR WPI; 2003-290046/28. 

DR N-PSDB; ADD18689. 
XX 

PT New substantially purified polypeptide, useful for diagnosing or treating 

PT a hypoxia-regulated condition, such as cancer, ischemia, reperfusion 

PT injury, retinopathy, pre-eclampsia, atherosclerosis, inflammation, or 

PT wound healing. 
XX 



PS Claim 25; SEQ ID NO 119; 424pp; English. 
XX 

CC This invention relates to novel human genes and gene product which are 

CC implicated in certain disease states. Compounds which modulate the 

CC proteins of the invention may have cytostatic, antiinflammatory, 

CC ophthalmological, antiarteriosclerotic or vulnerary activities. The 

CC sequences of the invention may be useful for gene therapy. The invention 

CC may be useful for diagnosing or treating a hypoxia-regulated condition, 

CC such as tumourigenesis, angiogenesis , apoptosis, inflammation, 

CC erythropoiesis, or the biological response to hypoxia conditions 

CC including processes such as glycolysis, gluconeogenesis , glucose 

CC transportation, catecholamine synthesis, iron transport or nitric oxide 

CC synthesis. The disease includes cancer, ischaemic conditions, reperfusion 

CC injury, retinopathy, neonatal stress, pre-eclampsia, atherosclerosis, 

CC inflammatory conditions or wound healing. The present sequence is that of 

CC a disease related protein of the invention. 

XX 

SQ Sequence 1140 AA; 

Query Match 100.0%; Score 3601; DB 7; Length 1140; 

Best Local Similarity 100.0%; Pred. No. 3.6e-158; 

Matches 586; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy i MVISLNSCLSFICLLLCHWIGTASPLNLEDPNVCSHWESYSVTVQESYPHPFDQIYYTSC 60 

| | | | | | | | | I I I I I I I I I I I I I I I M I I I I I I I I I M I M I I I I M II I 

Db i MVISLNSCLSFICLLLCHWIGTASPLNLEDPNVCSHWESYSVTVQESYPHPFDQI YYTSC 60 

Qy 61 TD ILNWFKCTRHRVSYRTAYRHGEKTMYRRKSQCCPGFYESGEMCVPHCADKCVHGRCIA 120 

| | | | | | | | | | | | | I I I M I I I M I I I M I I I I I I I M I II I I I I I I I I I I I I I 

Db 61 TDILNWFKCTRHRVSYRTAYRHGEKTMYRRKSQCCPGFYESGEMCVPHCADKCVHGRCIA 120 

Qy 12 1 PNTCQCEPGWGGTNCSSACDGDHWGPHCTSRCQCKNGALCNP1TGACHCAAGFRGWRCED 180 

I I | | | | I I I I I I I I I i I I I I I I I I I II I I M I I I I I I I I II I I I I I I I M I I II M I I II 

Db 121 PNTCQCEPGWGGTNCSSACDGDHWGPHCTSRCQCKNGALCNPITGACHCAAGFRGWRCED 180 

q v 181 RCEQGTYGNDCHQRCQCQNGATCDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPC 240 

|| | | M | | || | | | | II I I I II II I M I I I II I I M I I I I I I I I I I I I I I I I I I I I 

Db 181 RCEQGTYGNDCHQRCQCQNGATCDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPC 240 

Qv 241 QNGGVCHHVTGECSCPSGWMGTVCGQPCPEGRFGKNCSQECQCHNGGTCDAATGQCHCSP 300 

| | | || | | || | I I I I I I I M II I I I I I I I I I I I I I M I I I I I I I I I II I I 

Db 241 QNGGVCHHVTGECSCPSGWMGTVCGQPCPEGRFGKNCSQECQCHNGGTCDAATGQCHCSP 300 

0y 301 GYTGERCQDECPVGTYGVLCAETCQCVNGGKCYHVSGACLCEAGFAGERCEARLCPEGLY 360 

| M | | | | | | | | i | I I I I I I I I I I I I I II I I II I I I I I I I I I I I I I 

Db 301 GYTGERCQDECPVGTYGVLCAETCQCVNGGKCYHVSGACLCEAGFAGERCEARLCPEGLY 360 

Qy 361 GIKCDKRCPCHLENTHSCHPMSGECACKPGWSGLYCNETCSPGFYGEACQQICSCQNGAD 420 

| | I I I I I M I I I I I M I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 361 GIKCDKRCPCHLENTHSCHPMSGECACKPGWSGLYCNETCSPGFYGEACQQICSCQNGAD 420 

QV 421 CDSVTGKCTCAPGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVCSPVDGSCTCKAGWHGV 480 

Ml M I I I I I I I I I I I I I M I M I I I I I I I I I I I I I I I I I I I II I I I I I M I 

Db 421 CDSVTGKCTCAPGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVCSPVDGSCTCKAGWHGV 480 



Qy 



4 81 DCSIRCPSGTWGFGCNLTCQCLNGGACNTLDGTCTCAPGWRGEKCELPCQDGTYGLNCAE 54 0 
M I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I M I I 



Db 481 DCSIRCPSGTWGFGCNLTCQCLNGGACNTLDGTCTCAPGWRGEKCELPCQDGTYGLNCAE 540 

Qy 541 RCDCSHADGCHPTTGHCRCLPGWSGVHCDSVCAEGRWGPNCSLPCY 586 

| I I I I | M I I I I I I I I I I I I M I M I I I I I I I I I I I I I I I 

Db 541 RCDCSHADGCHPTTGHCRCLPGWSGVHCDSVCAEGRWGPNCSLPCY 58 6 



RESULT 4 






ADE71305 






ID 


ADE71305 standard; protein; 1192 AA. 






XX 








AC 


ADE71305; 






XX 








DT 


29-JAN-2004 (first entry) 






XX 








DE 


Novel human protein #59. 






XX 








KW 


human; novel protein; drug. 






XX 








OS 


Homo sapiens . 






XX 








PN 


JP2002345493-A. 






XX 








PD 


03-DEC-2002 . 






XX 








PF 


29-MAR-2001; 2002 JP-00049046 . 






XX 








PR 


29-MAR-2001; 2 001JP-0 0095524 . 






XX 








PA 


(KAZU-) ZH KAZUSA DNA KENKYUSHO. 






XX 








DR 


WPI; 2003-460885/44. 






DR 


N-PSDB; ADE71243. 






XX 








PT 


A gene and a protein encoded by it, used in drugs. 






XX 








PS 


Disclosure; Page 242-247; 257pp; Japanese. 






XX 






novel 


cc 


The invention comprises the amino acid and coding sequences of 


cc 


human proteins. The DNA and protein sequences of the 


invention 


are used 


cc 


in drugs. The present amino acid sequence represents 


a novel human 


cc 


protein of the invention. 






XX 








SQ 


Sequence 1192 AA; 







Query Match 100.0%; Score 3601; DB 7; Length 1192; 

Best Local Similarity 100.0%; Pred. No. 3.7e-158; 

Matches 586; Conservative 0; Mismatches 0; Indels 0; Gaps 



1 MVT SLNSCLSFICLLLCHWIGTASPLNLEDPNVCSHWESYSVTVQESYPHPFDQIYYTSC 60 

I I 1 M I I I I I I M I I I I I I I II I I II I I I I I I I I II I I I I I I I I I I I I I 1 I I M I I I I I I 
53 MVISLNSCLSFICLLLCHWIGTASPLNLEDPNVCSHWESYSVTVQESYPHPFDQI YYTSC 112 

61 TDILNWFKCTRHRVSYRTAYRHGEKTMYRRKSQCCPGFYESGEMCVPHCADKCVHGRCIA 12 0 

I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I M I II M M I I I I I I II I I I I I I I I M I I 

113 TDILNWFKCTRHRVSYRTAYRHGEKTMYRRKSQCCPGFYESGEMCVPHCADKCVHGRCIA 172 



Qy 


121 


Db 


173 


Qy 


181 


Db 


233 


Qy 


241 


Db 


293 


Qy 


301 


Db 


353 


Qy 


361 


Db 


413 


Qy 


421 


Db 


473 


Qy 


481 


Db 


533 


Qy 


541 


Db 


593 



PNTCQCEPGWGGTNCSSACDGDHWGPHCTSRCQCKNGALCNPITGACHCAAGFRGWRCED 

| | | | i | | I I I I I I 1 I I I I I I I I I I I I I I I I I I I I 1 I I I I I Ml 

PNTCQCEPGWGGTNCSSACDGDHWGPHCTSRCQCKNGALCNPITGACHCAAGFRGWRCED 



180 
232 



RCEQGTYGNDCHQRCQCQNGATCDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPC 24 0 

I M I I I I I I I I I I I I I I I I I i I I I I I i I I I I i I I I I I I I M I i I I II M I I II I 

RCEQGTYGNDCHQRCQCQNGATCDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPC 



292 



QNGGVCHHVTGECSCPSGWMGTVCGQPCPEGRFGKNCSQECQCHNGGTCDAATGQCHCSP 300 

| | | | | | I M I I I I I I I I I II I I I I I I I I I I M I I I I I I I I I I I M I I I I I I I I I 

QNGGVCHHVTGECSCPSGWMGTVCGQPCPEGRFGKNCSQECQCHNGGTCDAATGQCHCSP 352 



GYTGERCQDECPVGTYGVLCAETCQCVNGGKCYHVSGACLCEAGFAGERCEARLCPEGLY 

M M M II II I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I II I I I I I I M I I I 

GYTGERCQDECPVGTYGVLCAETCQCVNGGKCYHVSGACLCEAGFAGERCEARLCPEGLY 



360 
412 



GIKCDKRCPCHLENTHSCHPMSGECACKPGWSGLYCNETCSPGFYGEACQQICSCQNGAD 42 0 

M I I I I I I I I I I I I I I I M I I I M M I II I I I I I I I I I I I I I 

GIKCDKRCPCHLENTHSCHPMSGECACKPGWSGLYCNETCSPGFYGEACQQICSCQNGAD 472 



| | | | | I M | I I I I I I I I I I I I II I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I 

CDSVTGKCTCAPGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVCSPVDGSCTCKAGWHGV 532 
DCSIRCPSGTWGFGCNLTCQCLNGGACNTLDGTCTCAPGWRGEKCELPCQDGTYGLNCAE 54 0 

M I M M I I I I I I M II M I I I I I I I I I I I I I II I I II I I I I I I I 



RCDCSHADGCHPTTGHCRCLPGWSGVHCDSVCAEGRWGPNCSLPCY 58 6 

| | | | | | M I I II I I I I I I M I I I I I I I I M I II I I I I I I I I 

RCDCSHADGCHPTTGHCRCLPGWSGVHCDSVCAEGRWGPNCSLPCY 63 8 



RESULT 5 
AAG7 9417 

ID AAG79417 standard; protein; 994 AA. 
XX 
AC 
XX 
DT 
XX 
DE 
XX 
KW 
KW 
KW 
KW 
XX 
OS 
XX 
FH 
FT 
FT 
FT 
FT 
FT 
FT 



AAG79417; 

25-OCT-2002 (first entry) 
CADHP-6, Incyte ID No: 4097936CD1. 

Human; cell adhesion protein; CADHP; AIDS; Alzheimer's disease; 
acquired immunodeficiency syndrome; thymic dysplasia; epilepsy; 
renal tubular acidosis; congenital glaucoma; cancer; atherosclerosis; 
Parkinson's disease. 



Homo sapiens 
Key 

Domain 



Peptide 



Location/Qualifiers 
1. .609 

/label= Sushi_repeat 
/note= "Identified by BLAST-DOMO" 
1. .29 

/label= Signal_peptide 
/note= "Identified by HMMER" 



FT Peptide 1. .28 

FT /label= Signal_peptide 

FT /note= "Identified by HMMER" 

FT Peptide 1. .25 

FT /label= Signal_peptide 

FT /note= "Identified by HMMER" 

FT Peptide 1. .24 

FT /label= Signal_peptide 

FT /note- "Identified by HMMER" 

FT Peptide 1. .22 

FT /label= Signal_peptide 

FT /note= "Identified by HMMER" 

FT Peptide 1. .20 

FT /label= Signal_peptide 

FT /note= "Identified by HMMER" 

FT Peptide 1. .20 

FT /label- Signal_cleavage 

FT /note= "Identified by SPSCAN" 

FT Peptide 1. .19 

FT /label- Signal_peptide 

FT /note= "Identified by HMMER" 

FT Peptide 1. .18 

FT /label= Signal_peptide 

FT /note= "Identified by HMMER" 

FT Peptide 1 . .16 

FT /label= Signal_peptide 

FT /note= "Identified by HMMER" 

FT Modif ied-site 30 

FT /note= "Potentially phosphorylated" 

FT Modif ied-site 38 

FT /note- "Potentially phosphorylated" 

FT Domain 101. .131 

FT /label- EGF-like_domain 

FT /note- "Identified by HMMER- PFAM" 

FT Domain 120. .131 

FT /label- EGF-like_domain_signature_2 

FT /note- "Identified by MOTIFS" 

FT Domain 120. .131 

FT /label= EGF-like_domain_signature_l 

FT /note= "Identified by MOTIFS" 

FT Binding-site 127. .129 

FT " /label= Cell_attachemnt_sequence 

FT /note- "Identified by MOTIFS" 

FT Peptide 133. .161 

FT /label- Type__III_EGF-like_signature 

FT /note= "Identified by BLIMPS-PRINTS" 

FT Domain 138. .576 

FT /label= Sushi_repeat 

FT /note= "Identified by BLAST-DOMO" 

FT Domain 144. .174 

FT /label- EGF-li ke_domain 

FT /note- "Identified by HMMER- PFAM" 

FT Modif ied-site 152 

FT /note- "Potentially glycosylated" 

FT Modif ied-site 153 

FT /note- "Potentially glycosylated" 

FT Modif ied-site 154 



FT /note= "Potentially phosphorylated" 

FT Domain 18 7. .216 

F T /label= EGF-like_domain 

FT /note= "Identified by HMMER-PFAM" 

FT Domain 205. .216 

FT /label- EGF-like_domain__signature_l 

FT /note= "Identified by MOTIFS" 

FT Domain 229. .259 

F T /label- EGF-like_domain 

FT /note- "Identified by HMMER-PFAM" 

FT Domain 248. .259 

FT /label- EGF-like_domain_signature_l 

FT /note- "Identified by MOTIFS" 

FT Domain 248. .259 

FT /label= EGF-like_domain_signature_2 

FT /note- "Identified by MOTIFS" 

FT Domain 248. .252 

F T /label- Sushi__domain_protein 

FT /note- "Identified by BLIMPS-PFAM" 

FT Modif ied-site 271 

FT /note- "Potentially glycosylated" 

FT Domain 272. .302 

F T /label- EGF-like_domain 

FT /note- "Identified by HMMER-PFAM" 

FT Peptide 284. .302 

FT " /label- Type_III_EGF-like_signature 

FT /note- "Identified by BLIMPS-PRINTS" 

FT Domain 291. .302 

FT /label- EGF-like_domain__signature_l 

FT /note- "Identified by MOTIFS" 

FT Domain 291. .302 

FT /label- EGF-like_domain_signature_2 

FT /note- "Identified by MOTIFS" 

FT Domain 315. .345 

FT /label- EGF-like_domain 

FT /note- "Identified by HMMER-PFAM" 

FT Domain 334. .345 

FT /label- EGF-like_domain_signature_l 

FT /note- "Identified by MOTIFS" 

FT Domain 334. .345 

FT /label- EGF-like_domain_signature_2 

FT /note- "Identified by MOTIFS" 

FT Modif ied-site 346 

FT / no te= "Potentially phosphorylated" 

FT Modif ied-site 355 

FT /note- "Potentially phosphorylated" 

FT Domain 365. .391 

F T /label- EGF-like_domain 

FT /note- "Identified by HMMER-PFAM" 

FT Domain 380. .391 

F T /label- EGF-like_domain_s ignature_l 

FT /note- "Identified by MOTIFS" 

FT Domain 380. .391 

FT /label- EGF-like_domain_signature_2 

FT /note- "Identified by MOTIFS" 

FT Modif ied-site 392 

F T /note= "Potentially glycosylated" 



FT Domain 404. .434 

FT /label= EGF-like__domain 

FT /note= "Identified by HMMER-PFAM" 

FT Domain 423. .434 

FT /label= EGF-like_domain_signature_l 

FT /note= "Identified by MOTIFS " 

FT Domain 423. .434 

FT /label= EGF-like_domain_signature_2 

FT /note= "Identified by MOTIFS" 

FT Modified-site 446 

FT /note- "Potentially glycosylated" 

FT Domain 447. .4 77 

FT /label- EGF-like_domain 

FT /note- "Identified by HMMER-PFAM" 

FT Modified-site 448 

FT /note= "Potentially phosphorylated" 

FT Modified-site 460 

FT /note= "Potentially phosphorylated" 

FT Domain 466. .477 

FT /label- EGF-like_domain_signature_2 

F T /note= "Identified by MOTIFS" 

FT Modified-site 476 

FT /note= "Potentially glycosylated" 

FT Domain 490. .520 

FT /label- EGF-like_domain 

FT /note= "Identified by HMMER-PFAM" 

FT Modified-site 491 

FT /note- "Potentially glycosylated" 

FT Domain 509. .520 

FT /label- EGF-like_domain_signature_l 

FT /note- "Identified by MOTIFS" 

FT Domain 509. .520 

FT /label- EGF-like_domain_signature__2 

FT /note- "Identified by MOTIFS" 

FT Domain 533. .563 

FT /label- EGF-like_domain 

FT /note- "Identified by HMMER-PFAM" 

FT Modified-site 535 

FT /note- "Potentially phosphorylated" 

FT Domain 552. .563 

F T /label- EGF-like_domain_signature__l 

FT /note- "Identified by MOTIFS" 

FT Domain 552. .563 

FT /label- EGF-li ke_domain_signature_2 

FT /note- "Identified by MOTIFS" 

FT Modified-site 566 

FT /note- "Potentially phosphorylated" 

FT Modified-site 575 

FT /note- "Potentially glycosylated" 

FT Domain 576. .606 

FT /label- EGF-like_domain 

FT /note- "Identified by HMMER-PFAM" 

FT Modified-site 581 

FT /note- "Potentially phosphorylated" 

FT Domain 595. .606 

FT /label- EGF-like_domain_signature_l 

FT /note- "Identified by MOTIFS" 



FT Domain 595. .606 

F X /label= EGF-like_domain_signature_2 

FT /note- "Identified by MOTIFS" 

FT Domain 603. .614 

FT /label= Sushi_domain_protein 

FT /note= "Identified by BLIMPS-PFAM" 

FT Domain 619. .648 

Query Match 51.2%; Score 1842; DB 5; Length 994; 

Best' Local Similarity 51.8%; Pred. No. 2.4e-77; 

Matches 298; Conservative 61; Mismatches 210; Indels 6; Gaps 3; 

Qy 14 LLLCHWIGTASPLNLEDPNVCSHWESYSVTVQESYPHPFDQIYYTSCTDILNW FKCT 7 0 

M! : | I I I I I : : I : I I : I I = I ' 1 

Db 9 LLLAVGLRLAGTLNPSDPNTCSFWESFTTTTKESHSRPFSLLPSEPCE— RPWEGPHTCP 66 

0v 7i RHRVSYRTAYRHGEKTMYRRKSQCCPGFYESGEMCVPHCADKCVHGRCIAPNTCQCEPGW 130 

: [ III II li :h: III Mill III II HIIIII:IM Ml Ml 
Db 67 QPTWYRTVYRQWKTDHRQRLQCCHGFYESRGFCVPLCAQECVHGRCVAPNQCQCVPGW 12 6 

Qv 131 GGTNCS SACDGDHWGPHCT S RCQCKNGALCNP ITGACHCAAGFRGWRCEDRCEQGTYGND 190 

| : M | I III I III: 1:1 :l I I :l : I I I II 

Db 127 RGDDCSSECAPGMWGPQCDKPCSCGNNSSCDPKSGVCSCPSGLQPPNCLQPCTPGYYGPA 18 6 

Qv 191 CHQRCQCQNGATCDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPCQNGGVCHHVT 250 

| MM :|| II M I M II h I I I I I II II I I 
Db 187 CQFRCQC-HGAPCDPQTGACFCPAERTGPSCDVSCSQGTSGFFCPSTHPCQNGGVFQTPQ 245 

Oy 251 GECSCPSGWMGTVCGQPCPEGRFGKNCSQECQCHNGGTCDAATGQCHCSPGYTGERCQDE 310 

M I I I I I II : I I I I I I I I I I I I I : I I I I I I I I I I I I : I M I I : I I : : I 

Db 246 GSCSCPPGWMGTICSLPCPEGFHGPNCSQECRCHNGGLCDRFTGQCRCAPGYTGDRCREE 305 

Qy 3ii CPVGTYGVLCAETCQCVNGGKCYHVSGACLCEAGFAGERCEARLCPEGLYGIKCDKRCPC 370 

MM : I I M I I I : I : M M I M II MM I I M : I M : I II 

Db 306 CPVGRFGQDCAETCDCAPDARCFPANGACLCEHGFTGDRCTDRLCPDGFYGLSCQAPCTC 3 65 

Qy 371 HLENTHSCHPMSGECACKPGWSGLYCNETCSPGFYGEACQQICSCQNGADCDSVTGKCTC 430 

|:: MMMIIMI IIMIMMMI :| IM I I :l I : M I I 

Db 366 DREHSLSCHPMNGECSCLPGWAGLHCNESCPQDTHGPGCQEHCLCLHGGVCQATSGLCQC 425 



0v 431 A pGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVCSPVDGSCTCKAGWHGVDCSIRCPSGT 490 

MM I M: II MMMMII Ml MMMIIMI MM II M 

Db 42 6 APGYTGPHCASLCPPDTYGWCSARCSCENAIACSPIDGECVCKEGWQRGNCSVPCPPGT 



485 



545 



0v 4 91 WGFGCNLTCQCLNGGACNTLDGTCTCAP GWRGEKCELPCQDGTYGLNCAERCDCSHADGC 5 50 

| || || MM : M I I I I I M I Mill I : I I I MM Mill 

Db 48 6 WGFSCNASCQCAHEAVCSPQTGACTCTPGWHGAHCQLPCPKGQFGEGCASRCDCDHSDGC 

Qy 551 HPTTGHCRCLPGWSGVHCDSVCAEGRWGPNCSLPC 585 

I | I : I I I II II I I 

Db 546 DPVHGRCQCQAGWMGARCHLSCPEGLWGVNCSNTC 580 



RESULT 6 
AAB66267 

ID AAB66267 standard; protein; 1050 AA. 
XX 



AC AAB662 67; 
XX 

DT 05-APR-2001 (first entry) 
XX 

DE Human TANGO 272 SEQ ID NO: 14. 



XX 
KW 
KW 
KW 
KW 



Membrane associated protein; secreted protein; human; mouse; rat; 
INTERCEPT 340; MANGO 003; MANGO 347; TANGO 272; TANGO 295; TANGO 354; 
TANGO 378; skeletal disorder; cardiovascular disorder; renal disorder; 
haematopoietic disorder; neural disorder; hepatic disorder; 

KW neoplastic disease. 
XX 

OS Homo sapiens. 
XX 

PN WO200100673-A1. 
XX 

PD 04-JAN-2001. 
XX 

PF 29-JUN-2000; 2000WO-US018198 . 
XX 

PR 30-JUN-1999; 9 9US- 0 034 5 4 64 . 
XX 

PA (MILL-) MILLENNIUM PHARM INC. 
XX 

PI Barnes TM, Fraser CC, Wrighton N, Myers P, Busfield SJ, Sharp JD; 
XX 

DR WPI; 2001-050128/06. 

DR N-PSDB; AAF27787. 
XX 

PT Isolated secreted or transmembrane proteins are used for diagnosis and 

PT treatment of neoplastic and hematopoietic disorders e.g. T cell 

PT disorders, cancer and tumors. 
XX 

PS Claim 9; Page 227-229; 294pp; English. 
XX 

CC The present invention provides the protein and coding sequences for a 

CC number of membrane associated and secreted proteins from human, mouse and 

CC rat. The proteins are designated INTERCEPT 340, MANGO 003, MANGO 347, 

CC TANGO 272, TANGO 295, TANGO 254 and TANGO 378. The proteins are all 

CC involved in signal transduction and the sequences can be used in the 

CC treatment of cardiovascular, renal, hepatic, neural, neoplastic, skeletal 

CC and haematopoietic disorders 

XX 

SQ Sequence 1050 AA; 

Query Match 46.3ft; Score 1667.5; DB 4 ; Length 1050; 

Best Local Similarity 45.0%; Pred. No. 2.6e-69; 

Matches 284; Conservative 61; Mismatches 181; Indels 105; Gaps B 

0v 14 LLLCHWIGTASPLNLEDPNVCSHWESYSVTVQESYPHPFDQIYYTSCTDILNW 66 

M | : I M I I I I I I M = : I : I I: I I = I ' 

Db 9 LLLAVGLRLAGTLNPSDPNTCSFWESFTTTTKESHSRPFSLLPSEPCE— RPWEGPHTCP 66 

Qy 67 FKCTRHRVSYR TAY 80 

Db 67 SPQTQRKLLAS RDSFCMVCVGAGVQWRDRSALQPQTGNALSMRPQPRVLS GAPS LAS PGH 126 



Ov 81 RHGEKTMYRRKSQCCPGFYESGEMCVPHCADKCVHGRCIAPNTCQCEPGWGGTNCSSA— 138 

|| s|:: Ml Mill Ml II : Mill Ml Ml I : I I I I 

TVWKTDHRQRLQCCHGFYESRGFCVPLCAQECVHGRCVAPNQCQCVPGWRGDDCSSAPN 



Db 127 



186 



0v 139 CDGDHWGPHCTSRCQCKNGALCNPITGACHCAAGFRGWRCEDRCEQGTYGNDCHQR 194 

I ::M I MM Ml Ml II I I I I I M MM 

Db 187 CLQPCTPGYYGPACQFRCQC-HGAPCDPQTGACFCPAERTGPSCDVSCSQGT I* / 

nv 195 cocQNGATCDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPCQNGGVCHHVTGECS 254 

Y : I I I II I I M I M M I 

SGFFC PSTH P CQNGGVFQT PQGS C S 262 



Db 238 
QY 



255 CPSGWMGTVCGQPCPEGRFGKNCSQECQCHNGGTCDAATGQCHCSPGYTGERCQDECPVG 314 

i | M M M I IMM I I I I M M M I I I I I M II M I M I M I MM I I M 

Db 2 63 CPPGWMGTf CSLPCPEGFHGPNCSQECRCHNGGLCDRFTGQCRCAPGYTGDRCREECPVG 322 



Ov 315 TYGVLCAETCQCVNGGKCYHVSGACLCEAGFAGERCEARLCPEGLYGIKCDKRCPCHLEN 37 4 

: | I I I M I M : : IMM M I M I I I M 

RFGQDCAETCDCAPDARCFPANGACLCEHGFTGDRCTDRLCPDGFYGLSCQAPCTCDREH 



Db 323 
QY 



382 



375 THSCHPMSGECACKPGWSGLYCNETCSPGFYGEACQQICSCQNGADCDSVTGKCTCAPGF 434 

• | | | | | : | | | : I M M I M M I : I M I M I I M I : M I M I M 

Db 3 8 3 SLSCHPMNGECSCLPGWAGLHCNESCPQDTHGPGCQEHCLCLHGGVCQATSGLCQCAPGY 442 

Ov 435 KGIDCSTPCPLGTYGINCSSRCGCKNDAVCSPVDGSCTCKAGWHGVDCSIRCPSGTWGFG 494 

| | : : M II M I M M I M I I I I : M I M I I MM I 

Db 443 TGPHCASLCPPDTYGVNCSARCSCENAIACSPIDGECVCKEGWQRGNCSVPCPPGTWGFS 

Ov 4 95 CNLTCQCLNGGACNTLDGTCTCAPGWRGEKCELPCQDGTYGLNCAERCDCSHADGCHPTT 

QY M MM : M I Ml III I Mill I M II MM Mill I 

Db 503 CNASCQCAHEAVCSPQTGACTCTPGWHGAHCQLPCPKGQFGEGCASRCDCDHSDGCDPVH 

Qy 555 GHCRCLPGWSGVHCDSVCAEGRWGPNCSLPC 585 

I Ml M I I I M M II I I 
Db 563 GRCQCQAGWMGARCHLSCPEGLWGVNCSNTC 593 



502 
554 
562 



RESULT 7 
ABG27639 

ID ABG27639 standard; protein; 321 AA. 
XX 

AC ABG27 639; 
XX 

DT 18-FEB-2002 (first entry) 
XX 

DE Novel human diagnostic protein #27630 
XX 
KW 
KW 
XX 

OS Homo sapiens. 
XX 

PN WO200175067-A2 . 
XX 

PD ll-OCT-2001. 
XX 

PF 30-MAR-2001; 2001WO-US008631 



Human; chromosome mapping; gene mapping; gene therapy; forensic; 
food supplement; medical imaging; diagnostic; genetic disorder. 



XX 

PR 31-MAR-2000; 2000US-00540217 . 

PR 23-AUG-2000; 2 0 0 OUS- 0 0 64 9 1 67 . 
XX 

PA (HYSE-) HYSEQ INC. 
XX 

PI Drmanac RT, Liu C, Tang YT; 
XX 

DR WPI; 2001-639362/73. 

DR N-PSDB; AAS91826. 
XX 

PT New isolated polynucleotide and encoded polypeptides, useful in 

PT diagnostics, forensics, gene mapping, identification of mutations 

PT responsible for genetic disorders or other traits and to assess 

PT biodiversity. 
XX 

PS Claim 20; SEQ ID NO 57998; 103pp; English. 

XX 

CC The invention relates to isolated polynucleotide (I) and polypeptide (II) 

CC sequences. (I) is useful as hybridisation probes, polymerase chain 

CC reaction (PCR) primers, oligomers, and for chromosome and gene mapping, 

CC and in recombinant production of (II). The polynucleotides are also used 

CC in diagnostics as expressed sequence tags for identifying expressed 

CC genes. (I) is useful in gene therapy techniques to restore normal 

CC activity of (II) or to treat disease states involving (II) . (II) is 

CC useful for generating antibodies against it, detecting or quantitating a 

CC polypeptide in tissue, as molecular weight markers and as a food 

CC supplement. (II) and its binding partners are useful in medical imaging 

CC of sites expressing (II). (I) and (II) are useful for treating disorders 

CC involving aberrant protein expression or biological activity. The 

CC polypeptide and polynucleotide sequences have applications in 

CC diagnostics, forensics, gene mapping, identification of mutations 

CC responsible for genetic disorders or other traits to assess biodiversity 

CC and to produce other types of data and products dependent on DNA and 

CC amino acid sequences. ABGO 0 01 0-ABG3 0 37 7 represent novel human diagnostic 

CC amino acid sequences of the invention. Note: The sequence data for this 

CC patent did not appear in the printed specification, but was obtained in 

CC electronic format directly from WIPO at 

CC ftp.wipo.int/pub/published__pct_sequences 

XX 

SQ Sequence 321 AA; 

Query Match 40.7%; Score 1466; DB 4; Length 321; 

Best Local Similarity 90.3%; Pred. No. 2.1e-60; 

Matches 241; Conservative 0; Mismatches 0; Indels 26; Gaps 2; 



Qy 


137 


SACDGDHWGPHCTSRCQCKNGALCNPITGACHCAAGFRGWRCEDRCEQGTYGNDCHQRCQ 

| | | | I I I 1 1 M II 1 1 1 1 1 ! 1 1 1 M 1 1 1 1 1 II 1 1 1 M M 1 1 1 1 1 1 1 M II 

SACDGDHWGPHCTSRCQCKNGALCNPITGACHCAAGFRGWRCEDRCEQGTYGNDCHQRCQ 


196 


Db 


9 


68 


Qy 


197 


CQNGATCDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPCQNGGVCHHVTGECSCP 

I M | I I I I 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 M II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 N 1 1 

CQNGATCDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPCQNGGVCHHVTGECSCP 


256 


Db 


69 


128 


Qy 


257 


SGW m GTVCGQPCPEGRFGKNCSQECQCHNGGTCDAATGQCHC 

| | | | | 1 1 1 1 1 1 1 M II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 

SGWMLSFPGWRPIXFSKSLXMQGTVCGQPCPEGRFGKNCSQECQCHNGGTCDAATGQCHC 


298 


Db 


129 


188 



Qv 299 SPGYTGER CQDECPVGTYGVLCAETCQCVNGGKCYHVSGACLCEAGFAGERC 350 

M I I I i i I I I I I I I I I I I I I I M I II I I I I I I I I I M I I I 

Db 189 S pGYTGERAAVPDVRKCQDECPVGTYGVLCAETCQCVNGGKCYHVSGACLCEAGFAGERC 248 

Qy 351 EARLCPEGLYGIKCDKRCPCHLENTHS 377 

I I II I I I M I I I 1 I I I I I I I I I I I I I I 

Db 24 9 EARLCPEGLYGIKCDKRCPCHLENTHS 275 



RESULT 8 
ADA21141 

ID ADA21141 standard; protein; 1350 AA. 
XX 

AC ADA21141; 
XX 

DT 20-NOV-2003 (first entry) 
XX 

DE Human secreted protein SECP-46 SEQ ID NO: 46. 
XX 
KW 
KW 
KW 
KW 
KW 
KW 
KW 
KW 
KW 
KW 
KW 



human; secreted protein; SECP; anti-HIV; antiallergic; antiinflammatory; 
antianaemic; antiparkinsonian; nootropic; anticonvulsant; 
antiarteriosclerotic; antiasthmatic; immunosuppressive; antithyroid; 
cytostatic; hepatotropic; dermatological ; antidiabetic; nephrotropic ; 
antigout; thyromimetic; neuroprotective; osteopathic; antiarthritic ; 
antiparasitic; antihelminthic; antipsoriatic; uropathic; 
ophthalmological; antirheumatic; haemostatic; antibacterial; virucide; 
protozoacide; fungicide; gene therapy; cell proliferative disorder; 
arteriosclerosis; atherosclerosis; cirrhosis; hepatitis; 
paroxysmal nocturnal haemoglobinuria ; polycythaemia vera; psoriasis; 
primary thrombocytopaenia ; cancer; developmental disorder; 
KW renal tubular acidosis; anaemia; mental retardation; 
KW neurological disorder; Alzheimer's disease; Parkinson's disease; 
KW epilepsy; autoimmune disorder; inflammatory disorder; AIDS; allergy; 
KW asthma; autoimmune thyroiditis; contact dermatitis; Crohn's disease; 
KW diabetes mellitus; glomerulonephritis; Goodpasture's syndrome; gout; 
KW Graves* disease; Hashimoto's thyroiditis; irritable bowel syndrome; 
KW multiple sclerosis; osteoarthritis; osteoporosis; pancreatitis; 
KW Reiter's syndrome; rheumatoid arthritis; Sjogren's syndrome; uveitis; 
KW infection . 
XX 

OS Homo sapiens. 
XX 

PN WO2003068943-A2 . 
XX 

PD 21-AUG-2003. 
XX 

PF 13-FER-2003; 2 003WO-US0047 12 . 
XX 

PR 13-FEB-2002; 2 002US-0357 002P . 
PR 06-MAR-2002; 2 002US-0362 439P . 
PR 19-MAR-2002; 2002US-036604 IP . 
XX 

PA (INCY-) INCYTE GENOMICS INC. 
XX 

PI Lehr-Mason PM, Kable AE, Elliott VS , Marquis JP, Baughn MR; 

PI Chawla NK, Tran UK, Jin P, Tang YT, Zebarjadian Y, Swarnakar A; 



PI Hafalia AJA, Cocks BG, Warren BA, Emerling BM, Pearson CI, Chien D; 

PI Peterson DP, Fu GK, Yue H, Jackson AA, Jiang X, Hawkins PR, Lai PG; 

PI Khare R, Lee S, Lee SY, Richardson TW, Chang H; 
XX 

DR WPI; 2003-689669/65. 

DR N-PSDB; ADA21192. 



XX 
PT 
PT 



New human secreted proteins and polynucleotides, useful for diagnosing, 
treating or preventing autoimmune or inflammatory disorders (e.g. AIDS, 

PT allergy, asthma or anemia), multiple sclerosis, osteoporosis, cancer or 

PT hepatitis. 
XX 

PS Claim 1; Page 249-252; 295pp; English. 
XX 

CC The present sequence represents a human secreted protein (I) designated 

CC SECP-46. (I) have anti-HIV, antiallergic, antiinflammatory, antianaemic, 

CC antiparkinsonian, nootropic, anticonvulsant, antiarterios clerotic, 

CC antiasthmatic, immunosuppressive, antithyroid, cytostatic, hepatotropic, 

CC dermatological, antidiabetic, nephrotropic, antigout, thyromimetic, 

CC neuroprotective, osteopathic, antiarthritic, antiparasitic, 

CC antihelminthic, antipsoriatic, uropathic, ophthalmological , 

CC antirheumatic, haemostatic, antibacterial, virucide, protozoacide and 

CC fungicide activities, and can be used in gene therapy. The human secreted 

CC proteins (SECP) , polynucleotides, agonists and antagonists of the present 

CC invention are useful for diagnosing, treating or preventing disorders 

CC associated with aberrant expression of SECP, particularly cell 

CC proliferative disorders (e.g. arteriosclerosis, atherosclerosis, 

CC cirrhosis, hepatitis, paroxysmal, nocturnal haemoglobinuria , polycythaemia 

CC vera, psoriasis, primary thrombocytopaenia or cancer), developmental 

CC disorders (e.g. renal tubular acidosis, anaemia or mental retardation), 

CC neurological disorders (e.g. Alzheimer's disease, Parkinson's disease or 

CC epilepsy), autoimmune/inflammatory disorders (e.g. AIDS, allergies, 

CC asthma, autoimmune thyroiditis, contact dermatitis, Crohn's disease, 

CC diabetes mellitus, glomerulonephritis, Goodpasture's syndrome, gout, 

CC Graves' disease, Hashimoto's thyroiditis, irritable bowel syndrome, 

CC multiple sclerosis, osteoarthritis, osteoporosis, pancreatitis, Reiter's 

CC syndrome, rheumatoid arthritis, Sjogren's syndrome, uveitis), or viral, 

CC bacterial, fungal, parasitic, protozoan or helminthic infections. The 

CC SECP and polynucleotides are also useful in assessing the effects of 

CC exogenous compounds on the expression of nucleic acids secreted proteins. 

CC The polynucleotides encoding SECP are useful for creating transgenic 

CC animals to model human disease. 

XX 

SQ Sequence 1350 AA; 

Query Match 37.7%; Score 1356.5; DB 6; Length 1350; 

Best Local Similarity 42.6%; Pred. No. 6.1e-55; 

Matches 232; Conservative 50; Mismatches 207; Indels 56; Gaps 10 

0v 92 SQC— -CP-GFYESGEMCVPHCADKCVH-GRC-IAPNTCQCEPGWGGTNCSSACDGDHWG 145 

|:| 111:111 i : I : I I I I I III I :l Ml Ml 

Db 706 SRCQDVCPAGWY— GPSCQTRCS— CANDGHCHPATGHCSCAPGWTGFSCQRACDTGHWG 761 

Qv 146 PHCTSRCQCKNG-ALCNPITGACHCAAGFRGWRCEDRCEQGTYGNDCHQRCQCQNGATCD 204 

||: || | |: |: : I Ml :| II M I I IMIMI II 

Db 762 PDCSHPCNCSAGHGSCDAISGLCLCEAGYVGPRCEQQCPQGHFGPGCEQLCQCQHGAACD 821 



Ov 205 HVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPCQNGGVCHHVTGECSCPSGWMGTVC 264 

| | • | | | | 1 : | | | | Ml | I I I I I I I I i I : I I I 

Db 822 HVSGACTCPAGWRGTFCEHACPAGFFGLDCRSACNCTAGAACDAVNGSCLCPAGRRGPRC 881 

Qy 265 GQPCPEGRFGKNCSQECQCHNGGTCDAATGQCHCSP 300 

: | | : I I I I I I I I I = I ' I I I I I : I 

Db 8 82 AETCPAHTYGHNCSQACACFNGASCDPVHGQCHCAPGWMGPSCLQECLPRDVRAGCRHSG 941 



Qy 



301 GYTGERCQDECPVGTYGVLCAETCQCVNGGKCYHVSGACLC 341 

| : | | : : | I I I : I I I : I I I I : I I : I I I I 

Db 942 GCLNGGLCDPHTGRCLCPAGWTGDKCQSPCLRGWFGEACAQRCSCPPGAACHHVTGACRC 1001 

Ov 342 EAGFAGERCEARLCPEGLYGIKCDKRCPCHLENTHSCHPMSGECACKPGWSGLYCNETCS 401 

M | M : I I I : I I : I I M : I I I : I I : I 1=1 I : ' 

Db 1002 PPGFTGSGCE-QACPPGSFGEDCAQMCQCPGENP-ACHPATGTCSCAAGYHGPSCQQRCP 1059 

ov 402 PGFYGEACQQ1CSCQNGADCDSVTGKCTCAPGFKGIDCSTPCPLGTYGINCSSRCGCKND 461 

illl I: hi I II II: II I I M I lh II I =1 lh Ml 
Db 1060 PGRYGPGCEQLCGCLNGGSCDAATGACRCPTGFLGTDCNLTCPQGRFGPNCTHVCGCGQG 1119 

Ov 462 AVCSPVDGSCTCKAGWHGVDCSIRCPSGTWGFGCNLTCQCLNGGACNTLDGTCTCAPGWR 521 

Y | | || | : | | | M I M : I I I I l ; : I : i : I 11 

Db 1120 AACDPVTGTCLCPPGRAGVRCERGCPQNRFGVGCEHTCSCRNGGLCHASNGSCSCGLGWT 1179 

Ov 522 GEKCELPCQDGTYGLNCAERCDCSHADGCHPTTGHCRCLPGWSGVHCDSVCAEGRWGPNC 581 

Y || ! | ||: I I II Ml II: I I : I I I I 

Db 1180 GRHCELACPPGRYGAACHLECSCHNNSTCEPATGTCRCGPGFYGQACEHPCPPGFHGAGC 1239 

Qy 582 SLPCY 586 

I : 

Db 1240 QGLCW 1244 



RESULT 9 
ADD78227 

ID ADD78227 standard; protein; 1261 AA. 
XX 

AC ADD7 8227; 
XX 

DT 29-JAN-2004 (first entry) 
XX 

DE Human CGDD-8 . 
XX 
KW 
KW 
KW 
KW 
KW 
KW 
KW 
KW 



Anabolic; Hypertensive; Respiratory; Anti-HIV; Antiallergic; 
Neuroprotective; Nootropic; Antianemic; Antiarteriosclerotic ; 
Antiinflammatory; Opthalmological ; Muscular; Hepatot ropic ; 
Neuroprotective; Antiasthmatic; Anticonvulsant; Virucide; Antibacterial; 
Fungicide; Antiparasitic; Protozoacide ; Antihelminthic ; ^ Cytostatic ; 
Cerebroprotective; Antiparkinsonian; Antipsoriatic ; Antigout; 
Antidiabetic; Antiarthritic; Antirheumatic; Osteopathic; Gene therapy; 
human; cell growth; cell differentiation; cell death; CGDD; 
KW cell proliferative disorder; cancer; developmental disorder; 
KW neurological disorder; autoimmune disorder; inflammatory disorder; 
KW infection; reproductive disorder. 
XX 

OS Homo sapiens . 
XX 



PN WO2003077875-A2 . 
XX 

PD 25-SEP-2003. 
XX 

PF 14-MAR-2003; 20 03WO-US 008310 . 
XX 

PR 15-MAR-2002; 2002US-03644 94P . 

PR 29-MAR-2002; 20 02US-036912 9P . 

PR 12-APR-2002; 2002US-0372511P . 
XX 

PA (INCY-) INCYTE GENOMICS INC. 
XX 

PI Kable AE, Tran UK, Hafalia AJA, Burford N, Honchell CD; 

PI Lehr-Mason PM, Duggan BM, Ramkumar J , Griffin JA, Richardson TW; 

PI Elliott VS, Jiang X, Jackson AA, Marquis JP, Chawla NK, Khare R; 

PI Becha SD, Lee SY, Swarnakar A, Yue H, Warren BA, Baughn MR, Lai PG; 

PI Lee S, Ho A, Gandhi AR, Yao MG; 

XX 

DR WPI; 2003-779081/73. 

DR N-PSDB; ADD78266. 
XX 

PT New polypeptides and polynucleotides associated with cell growth, 

PT differentiation and death, useful for diagnosing, treating or preventing 

PT e.g. developmental, neurological, autoimmune, inflammatory or 

PT reproductive disorders. 

XX 

PS Claim 1; SEQ ID NO 8; 320pp; English. 
XX 

CC The present invention relates to novel human proteins (I; ADD78220- 

CC ADD78258) and their coding sequences (II; ADD7 82 5 9-ADD7 82 97 ) , which are 

CC associated with cell growth, differentiation and death, referred to as 

CC CGDD-n proteins, where n is a number from 1 to 39. The CGDD proteins and 

CC their coding sequences are useful for diagnosing, treating or preventing 

CC cell proliferative disorders (e.g. cirrhosis, hepatitis, 

CC arteriosclerosis, psoriasis, primary thrombocytopenia) or cancers (e.g. 

CC adenocarcinoma, sarcoma or cancers of the bone, bone marrow, brain, 

CC breast, colon, kidney, liver, lung or uterus), developmental disorders 

CC (e.g. renal tubular acidosis, Becker muscular dystrophy, gonadal 

CC dysgenesis, hypothyroidism or seizures), neurological disorders (e.g. 

CC Pick's disease, cataract, epilepsy, ischemic cerebrovascular disease, 

CC stroke, Alzheimer's disease, Parkinson's disease or dementia), 

CC autoimmune/inflammatory disorders (e.g. AIDS, allergies, anemia, asthma, 

CC diabetes mellitus, bronchitis, osteoporosis, osteoarthritis, rheumatoid 

CC arthritis, contact dermatitis or gout), viral, bacterial, fungal, 

CC parasitic, protozoan or helminthic infections, reproductive disorders 

CC (e.g. infertility, ectopic pregnancy, premature ovarian failure, delayed 

CC puberty or prostatitis) or disorders of the placenta (e.g. preeclampsia, 

CC choriocarcinoma, placenta previa, placental or maternal floor infarction 

CC or chronic villitis) . 

XX 

SQ Sequence 1261 AA; 

Query Match 36.5%; Score 1313.5; DB 7; Length 1261; 

Best Local Similarity 38.8%; Pred. No. 5.5e-53; 

Matches 231; Conservative 50; Mismatches 206; Indels 109; Gaps 



QY 



94 CCPGFYESGEMC 



•VPHCADKCVHGRCIAPNT 



CQCEPGWGGTNC 135 



I I I I S I I I : : : I : I : I : I | | : I : I I 

Db 523 CDPGLY — GRFCHLTCPPWAFGPGCSEEC QCVQPHTQSCDKRDGSCSCKAGFRGERC 577 

Qy 136 SSACDGDHWGPHCTSRCQCKNGALCNPITGAC--HCAAGFRGWRCEDRCEQGTYGNDCHQ 193 

: I : : : | | | I I I | : : : I I I I I I : I I i I I : I : I 

Db 57 8 QAECELGYFGPGCWQACTCPVGVACDSVSGECGKRCPAGFQGEDCGQECPVGTFGVNCSS 637 

Qy 194 RCQCQNGATCDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPCQNGGVCHHVTGEC 253 

|| Ml llhllllllll II IM I I I I M I I I : I I I I : I I 
Db 638 SCSC-GGAPCHGVTGQCRCPPGRTGEDCEADCPEGHFGPGCEQRCQCQHGAACDHVSGAC 696 



QY 



254 SCPSGWMGTVCGQPCPEGRF " 273 

: I I : I I II I I I I I 

Db 697 TCPAGWRGTFCEHACPAGFFGLDCRSACNCTAGAACDAVNGSCLCPAGRRGPRCAETCPA 756 



Qy 974 — GKNCSQECQCHNGGTCDAATGQCHCSPGYTGERCQDECPVGTYGVLCAETCQCVNGG 330 

I I I I i I I I I : I I llllhlhl I : i I M I 

Db 757 HTYGHNCSQACACFNGASCDPVHGQCHCAPGWMGPSCLQACPAGLYGDNCRHSCLCQNGG 816 



QY 



331 KCYHVSGACLCEAGFAGERCEAR LCPEGLYG 361 

II : I I I I I I I I I 

Db 817 TCDPVSGHCACPEGWAGLACEKECLPRDVRAGCRHSGGCLNGGLCDPHTGRCLCPAGWTG 876 



0v 362 IKCDKRC PCHLENTHSCHPMSGECACKPGWSGLYCNETCSPGFYGEACQ 410 

Ml | : II : : I I I ll::l I : I II II 1= 

Db 877 DKCQSPCLRGWFGEACAQRCSCPPGAACHHVTGACRCPPGFTGSGCEQGCPPGRYGPGCE 936 

Q y 4H QicSCQNGADCDSVTGKCTCAPGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVCSPVDGS 470 

| : | | | I I I : II 1 : I I I : I I I : Ml Mill: 

Db 937 QLCGCLNGGSCDAATGACRCPTGFLGTDCNLTCPQGRFGPNCTHVCGCGQGAACDPVTGT 996 

Qy 471 CTCKAGWHGVDCSIRCPSGTWGFGCNLTCQCLNGGACNTLDGTCTCAPGWRGEKCELPCQ 530 

|| | Ml II : I II II I III h : I : I : i I I I MM 
Db 997 CLCPPGRAGVRCERGCPQNRFGVGCEHTCSCRNGGLCHASNGSCSCGLGWTGRHCELACP 1056 

Qy 531 DGTYGLNCAERCDCSHADGCHPTTGHCRCLPGWSGVHCDSVCAEGRWGPNCSLPCY 586 

Ml | M : I I I I I I M I : I I : I I I I I : 

Db 1057 PGRYGAACHLECSCHNNSTCEPATGTCRCGPGFYGQACEHPCPPGFHGAGCQGLCW 1112 



RESULT 10 

ABJ37903 

ID ABJ37903 standard; protein; 1403 AA. 
XX 

AC ABJ37903; 
XX 

DT 22-MAY-2003 (first entry) 
XX 

DE NOVX protein sequence SEQ ID No 52. 
XX 

KW Hepatotropic; immunosuppressive; cardiant; hypertensive; tranquilizer; 

KW vulnerary; virucide; antibacterial; protozoacide; fungicide; nootropic; 

KW antiparasitic; neuroprotective; cerebroprotective ; antiparkinsonian; 

KW anticonvulsant; antiaddictive ; analgesic; dermatological ; keratolytic; 

KW antiseborrheic; antirheumatic; antiarthri tic ; antiinflammatory; anti-HIV; 

KW cytostatic; antiasthmatic; antipsoriatic; hypotensive; osteopathic; 

KW antiulcer; anorectic; antidiabetic; antiallergic; haemostatic; 



KW 


neuroleptic; 


antidepressant; antiinf ertility ; NOVX; human disease; 


KW 


NOVX-associated disorder; trauma; viral; bacterial; fungal; protozoal; 


KW 


parasitic infection; Alzheimer's disease; stroke; forensic biology; 


KW 


immunogen; non-human transgenic animal; gene therapy. 


XX 






OS 


Unidentified . 


XX 






PN 


WO200281517- 


A2 . 


XX 






PD 


17-OCT-2002 . 




XX 






PF 


22-JAU-2002; 


2002WO-US002064 . 


XX 






PR 


19-JAN-2001; 


2001US-0262892P. 


PR 


23-JAN-2001; 


2001US-0263598P. 


PR 


24-JAN-2001; 


2001US-0263799P . 


PR 


25-JAN-2001; 


2001US-0264117P. 


PR 


25-JAN-2001; 


2001US-0264139P. 


PR 


26-JAN-2001; 


2001US-0264478P. 


PR 


30-JAN-2001, 


2001US-0263351P. 


PR 


02-MAR-2001, 


2001US-0272870P . 


PR 


14-MAR-2001, 


2001US-0275927P. 


PR 


14-MAR-2001, 


2001US-0275990P. 


PR 


15-MAR-2001; 


2001US-0276449P. 


PR 


20-MAR-2001, 


2001US-0277358P. 


PR 


23-MAR-2001, 


2001US-0278151P. 


PR 


29-MAR-2001 


2001US-0279857P. 


PR 


20-APR-2001 


i 2001US-0285140P . 


PR 


20-APR-2001 


: 2001US-0285141P. 


PR 


30-APR-2001 


; 2001US-0287484P . 


PR 


17-MAY-2001 


? 2001US-0291701P. 


PR 


08-JUN-2001 


; 2001US-0296960P . 


PR 


10-JUL-2001 


; 2001US-0304353P. 


PR 


10- JUL-2001 


; 2001US-0304355P. 


PR 


12-JUL-2001 


; 2001US-0304886P. 


PR 


09-AUG-2001 


; 2001US-0311289P. 


PR 


13-AUG-2001 


; 2001US-0311975P. 


PR 


16-AUG-2001 


; 2001US-0312937P. 


PR 


18-OCT-2001 


; 2001US-0330227P. 


PR 


29-NOV-2001 


; 2001US-0334198P. 


XX 






PA 


(CURA-) CURAGEN CORP. 


XX 






PI 


Decristofaro MF, Padigaru M, Miller C, Tchernev V, Zhong H ; 


PI 


Zhong M, Anderson D, Ballinger R f Gerlach V, Spytek KA, Rastelli L; 


PI 


Kekuda R, 


Guo X, Zerhusen B, Andrew D, Mezes P, Patturajan M; 


PI 


Burgess CE, 


Eisen A, Wolenc A, Baumgartner J, Shimkets RA, Gusev V; 


PI 


Vernet CAM, 


Taupier RJ, Pena C, Shenoy S, Li L, Casman S, Boldog F 


PI 


Fernandes E 


Smithson G, Malyankar U, Taillon B, Liu X; 


XX 






DR 


WPI; 2003-058504/05. 


DR 


N-PSDB; ABT33368. 


XX 






PT 


New polypeptides, designated as NOVX, useful for diagnosing and treating 


PT 


infections , 


neurological diseases, cancer, allergy, and bone, 


PT 


immunological, skin, renal, brain, muscle and autoimmune disorders. 


XX 







PS Claim 1; Page 133; 672pp; English. 
XX 

CC The invention relates to a novel isolated polypeptide, designated NOVX 

CC (N0V1 - 33), consisting of a mature form of one of 61 sequences, given in 

CC the specification, or its variant, where amino acid residue (s) in the 

CC variant differ from the mature form, provided that the variant differs in 

CC not more than 15 % of the amino acids from the sequence of the mature 

CC form. The NOVX polypeptides, nucleic acids encoding the polypeptides, and 

CC an antibody to the polypeptides, are useful for treating or preventing a 

CC NOVX-associated disorder in humans and for treating a syndrome associated 

CC with a human disease (NOVX-associated disorder) . NOVX polypeptides and 

CC the encoding nucleic acids, are useful for determining the presence of or 

CC predisposition to a disease associated with altered levels of NOVX 

CC polypeptide and polynucleotide, by measuring the level of polypeptide^ 

CC expression or the amount of nucleic acid from a mammal and comparing it 

CC with another mammal not having or not predisposed to the disease. NOVX 

CC polypeptide is also useful for identifying an agent that binds to NOVX 

CC and a cell expressing NOVX is useful for identifying an agent that 

CC modulates the expression or activity of NOVX. The antibodies and a 

CC polypeptide having 95 % sequence identity to NOVX polypeptide are useful 

CC for treating a pathological state in a mammal. The antibodies are also 

CC useful for determining the presence or amount of NOVX in a sample. NOVX 

CC polypeptides, polynucleotides and antibodies specific for the 

CC polypeptides are useful for treating or preventing disorders or syndromes 

CC including trauma, viral, bacterial, fungal, protozoal, and parasitic 

CC infections. They can also treat disorders such as e.g., Alzheimer's 

CC disease or a stroke. The NOVX encoding nucleic acids are useful for 

CC expressing the NOVX proteins, to detect NOVX mRNA, or a genetic lesion in 

CC a NOVX gene and to modulate NOVX activity. NOVX sequences are also useful 

CC for identifying a cell or tissue type in a biological sample, to amplify 

CC DNA sequences from very small biological samples such as tissues e.g. 

CC hair or skin or body fluids in forensic biology and as primers and probes 

CC for use in identifying and/or cloning NOVX homologues in other cell 

CC types. The NOVX proteins are useful as an immunogen to generate 

CC antibodies which are useful for diagnostically monitoring protein levels 

CC and modulating NOVX activity. Cells comprising NOVX nucleic acids are 

CC useful for producing non-human transgenic animals which are useful for 

CC studying the function and/or activity of NOVX protein and for identifying 

CC and/or evaluating modulators of NOVX protein activity. The NOVX nucleic 

CC acids can be used in gene therapy. This sequence represents a NOVX 

CC protein of the invention 

XX 

SQ Sequence 1403 AA; 

Query Match 36.4%; Score 1310.5; DB 6; Length 1403; 

Best" Local Similarity 41.8%; Pred. No. 8.2e-53; 

Matches 228; Conservative 45; Mismatches 217; Indels 55; Gaps 10 

Qy 90 RKSQCCPGFYESGEMCVPHCADKCVH-GRCIA-PNTCQCEPGWGGTNCSSACDGDHWGPH 147 

| | |:| I I I : I : I I I I III I :l I I I MM 

Db 814 RCQDCEAGWY— GPSCQTMCS— CANDGHCHQDTGHCSCAPGWTGFSCQRACDTGHWGPD 869 

Qy 14 8 CTSRCQCKNG-ALCNPITGACHCAAGFRGWRCE-DRCEQGTYGNDCHQRCQCQNGATCDH 2 05 

| : | | I 1:1:11111:1111 I I I : I I I I I I I I : I I I I I 

Db 870 CSHPCNCSAGHGSCDAISGLCLCEAGYVGPRCEQSECPQGHFGPGCEQRCQCQHGAACDH 929 



QY 



2 06 VTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPCQNGGVCHHVTGECSCPSGWMGTVCG 2 65 



|:| 1 II I: I III 111 I N : I I I 

Db 930 VS GAC T C P AGWRGT F C E HAC PAG F FG L D C RS ACN CT AGAAC DAW G S C L C PAG RRG P RCA 989 

Qy 2 66 QPCPEGRFGKNCSQECQCHNGGTCDAATGQCHCSPGYTGERCQDECPVGTYGVLCAETCQ 32 5 

: | I : | I I I I I I I I : I I I I I I I : I I : I I I I I I I I : I 

Db 990 ET C PAHT YGHNC S QACAC FN GAS C D P VHGQCHCAP GWMG P S CLQAC PAGL YGDNC RH S CL 104 9 



Qy 



326 CVNGGKCYHVSGACLCEAGFAGERCEAR L CP 356 

I I I I I I I I I I I : I I I I I 1 1 

Db 105 0 CQNGGTCDPVSGHCACPEGWAGLACEVECLPRDVRAGCRHSGGCLNGGLCDPHTGRCLCP 110 9 



Qy 



357 EGLYGIKCDKRCPCHLENTHSCHPMSGECACKPGWSGLYCNETCS 401 

: I : I I : I I I II : I I = ' I 

Db 1110 AGWTGDKCQSPAACAKGTFGPHCEGRCACRWGG— PCHLATGACLCPPGWRGPHLSAACL 1167 



Qy 4 02 PGFYGEACQQLCSCQNGADCDSVTGKCTCAPGFKGIDCSTPCPLGTYGINCSSRCGCKND 4 61 

| : : | | I I I I II III I I I I I I M I I I I I : : I : I : II : 
Db 1168 RGWFGEACAQRCSCPPGAACHHVTGACRCPPGFTGSGCEQACPPGSFGEDCAQMCQCPGE 1227 

Qy 4 62 -AVCSPVDGSCTCKAGWHGVDCSIRCPSGTWGFGCNLTCQCLNGGACNTLDGTCTCAPGW 52 0 

II | : | : | I | : I I I I I I I : I I I I I I I I I : I = I I I I : 

Db 1228 NPACHPATGTCSCAAGYHGPSCQQRCPPGRYGPGCEQLCGCLNGGSCDAATGACRCPTGF 1287 

Q y 521 RGEKCELPCQDGTYGLNCAERCDCSHADGCHPTTGHCRCLPGWSGVHCDSVCAEGRWGPN 580 

I ill I : I I I II I I I : I I I : I : I 2 I 

Db 1288 LGTDCNLTCPQGRFGPNCTHVCGCGQGAACDPVTGTCLCPPGRAGVRCERGCPQNRFGVG 1347 

Qy 581 CSLPC 585 

I I 

Db 1348 CEHTC 1352 



RESULT 11 




ABGOf 


3033 




ID 


ABG08033 standard; protein; 878 AA. 




XX 






AC 


ABG08033; 




XX 






DT 


13-FEB-2002 (first entry) 




XX 






DE 


Novel human diagnostic protein #8024. 




XX 




therapy; forensic; 


KW 


Human; chromosome mapping; gene mapping; gene 


KW 


food supplement; medical imaging; diagnostic- 


genetic disorder. 


XX 






OS 


Homo sapiens. 




XX 






PN 


WO200175067-A2. 




XX 






PD 


ll-OCT-2001. 




XX 






PF 


30-MAR-2001; 2001WO-US008631 . 




XX 






PR 


31-MAR-2000; 2000US-00540217 . 




PR 


23-AUG-2000; 2000US-00649167 . 




XX 






PA 


(HYSE-) HYSEQ INC. 





XX 

PI Drmanac RT, Liu C, Tang YT; 
XX 

DR WPI; 2001-639362/73. 

DR N-PSDB; AAS72220. 
XX 

PT New isolated polynucleotide and encoded polypeptides, useful in 

PT diagnostics, forensics, gene mapping, identification of mutations 

PT responsible for genetic disorders or other traits and to assess 

PT biodiversity. 
XX 

PS Claim 20; SEQ ID NO 38392; 103pp; English. 
XX 

CC The invention relates to isolated polynucleotide (I) and polypeptide (II) 

CC sequences. (I) is useful as hybridisation probes, polymerase chain 

CC reaction (PCR) primers, oligomers, and for chromosome and gene mapping, 

CC and in recombinant production of (II) . The polynucleotides are also used 

CC in diagnostics as expressed sequence tags for identifying expressed 

CC genes. (I) is useful in gene therapy techniques to restore normal 

CC activity of (II) or to treat disease states involving (II). (II) is 

CC useful for generating antibodies against it, detecting or quantitating a 

CC polypeptide in tissue, as molecular weight markers and as a food^ 

CC supplement. (II) and its binding partners are useful in medical imaging 

CC of sites expressing (II). (I) and (II) are useful for treating disorders 

CC involving aberrant protein expression or biological activity. The 

CC polypeptide and polynucleotide sequences have applications in 

CC diagnostics, forensics, gene mapping, identification of mutations 

CC responsible for genetic disorders or other traits to assess biodiversity 

CC and to produce other types of data and products dependent on DNA and 

CC amino acid sequences. ABG00010-ABG30377 represent novel human diagnostic 

CC amino acid sequences of the invention. Note: The sequence data for this 

CC patent did not appear in the printed specification, but was obtained in 

CC electronic format directly from WIPO at 

CC ftp . wipo . int/pub/published_pct_sequences 

XX 

SQ Sequence 878 AA; 

Query Match 36.4%; Score 1309; DB 4; Length 878 ; 

Best Local Similarity 86.9%; Pred. No. 6.9e-53; 

Matches 218; Conservative 3; Mismatches 14; Indels 16; Gaps 3; 

ACLCEAGFAGERCEARLCPEGLYGIKCDKRCPCHLEN— THSCHPMSGECACKPGWSGLY 395 
|||: : | I I II : I I I I I I I I I I I I I I I I I I I _ 



M I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I M I I I 

CNETCSPGFYGEACQQICSCQNGADCDSVTGKCTCAPGFKGIDCSTPCPLGTYGINCSSR 

CGCKNDAVCSPVDGSCTCKAGWHGVDCSIRCPSGTWGFGCNLTCQCLNGGACNTLDGTCT 

I I I I I I I I I I I I I I M I M I I I I I I I I I I I I M I I I I I I I I I I I I I I I M I I I I I I M II 



CAPGWRGEKCELPCQDGTYGLNCAERCDCSHADGCHPTTGHCRCLPGWSGVHCDSVCAEG 57 5 

| | I I I I I I M I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I M I 

CAPGWRGEKCELPCQDGTYGLNCAERCDCSHADGCHPTTGHCRCLPGWSGVHCDSVCAEG 230 



Qy 


338 


Db 


5 


Qy 


396 


Db 


51 


Qy 


456 


Db 


111 


Qy 


516 


Db 


171 



Qy 576 RWGPNCSLPCY 586 

I I I I I I I I 1 I I 
Db 231 RWGPNCSLPCY 241 



RESULT 12 
ABJ37904 



ID 


ABJ37904 standard; protein; 1577 AA. 


XX 






AC 


ABJ37904; 




XX 






DT 


22-MAY-2003 


(first entry) 


XX 






DE 


NOVX protein 


sequence SEQ ID No 54. 


XX 




; immunosuppressive; cardiant; hypertensive; tranquilizer; 


KW 


Hepatotropic 


KW 


vulnerary; virucide; antibacterial; protozoacide ; fungicide; nootropic- 


KW 


antiparasitic; neuroprotective; cerebroprotective ; antiparkinsonian; 


KW 


anticonvulsant; antiaddictive ; analgesic; dermatological ; keratolytic; 


KW 


antiseborrheic; antirheumatic; antiarthritic; antiinflammatory; anti-HIV 


KW 


cytostatic- 


antiasthmatic; antipsoriatic; hypotensive; osteopathic- 


KW 


antiulcer; anorectic; antidiabetic; antiallergic; haemostatic; 


KW 


neuroleptic; 


antidepressant; antiinf ertility; NOVX; human disease; 


KW 


NOVX-associated disorder; trauma; viral; bacterial; fungal; protozoal; 


KW 


parasitic infection; Alzheimer's disease; stroke; forensic biology; 


KW 


immunogen; non-human transgenic animal; gene therapy. 


XX 






OS 


Unidentified. 


XX 






PN 


WO200281517- 


A2. 


XX 






PD 


17-OCT-2002 . 




XX 






PF 


22-JAN-2002; 


2002WO-US002064 . 


XX 






PR 


19-JAN-2001; 


2001US-0262892P. 


PR 


23-JAN-2001; 


2001US-0263598P. 


PR 


24-JAN-2001; 


2001US-0263799P. 


PR 


25-JAN-2001; 


2001US-0264117P. 


PR 


25-JAN-2001; 


2001US-0264139P. 


PR 


26-JAN-2001, 


2001US-0264478P. 


PR 


30-JAN-2001, 


2001US-0263351P. 


PR 


02-MAR-2001, 


2001US-0272870P. 


PR 


14-MAR-2001, 


2001US-0275927P. 


PR 


14-MAR-2001, 


2001US-0275990P. 


PR 


15-MAR-2001, 


2001US-0276449P. 


PR 


2 0-MAR-2 001, 


• 2001US-0277358P. 


PR 


23-MAR-2001 


• 2001US-0278151P. 


PR 


2 9-MAR-2 001 


; 2001US-0279857P. 


PR 


20-APR-2001 


; 2001US-0285140P. 


PR 


20-APR-2001 


; 2001US-0285141P. 


PR 


30-APR-2001 


; 2001US-0287484P. 


PR 


17-MAY-2001 


; 2001US-0291701P. 


PR 


08- JUN-2001 


; 2001US-0296960P. 


PR 


10-JUL-2001 


; 2001US-0304353P. 


PR 


10-JUL-2001 


; 2001US-0304355P. 


PR 


12-JUL-2001 


; 2001US-0304886P. 



PR 09-AUG-2001; 2 0 0 1US- 0 3112 8 9P . 

PR 13-AUG-2001; 2 001US- 031197 5P . 

PR 16-AUG-2001; 2 0 0 1US- 0 3 12 937 P . 

PR 18-OCT-2001; 2001US-0330227P . 

PR 29-NOV-2001; 2 0 0 1US- 0 33 4 1 9 8 P . 
XX 

PA (CURA-) CURAGEN CORP. 
XX 

PI Decristofaro MF, Padigaru M, Miller C, Tchernev V, Zhong H; 

PI Zhong M, Anderson D, Ballinger R, Gerlach V, Spytek KA, Rastelli L; 

PI Kekuda R, Guo X, Zerhusen B, Andrew D, Mezes P, Patturajan M; 

PI Burgess CE, Eisen A, Wolenc A, Baumgartner J, Shimkets RA, Gusev V; 

PI Vernet CAM, Taupier RJ, Pena C, Shenoy S, Li L, Casman S, Boldog F; 

PI Fernandes E, Smiths on G, Malyankar U, Taillon B, Liu X; 

XX 

DR WPI; 2003-058504/05. 

DR N-PSDB; ABT33369. 
XX 

PT New polypeptides, designated as NOVX, useful for diagnosing and treating 

PT infections, neurological diseases, cancer, allergy, and bone, 

PT immunological, skin, renal, brain, muscle and autoimmune disorders. 

XX 

PS Claim 1; Page 135-136; 672pp; English. 
XX 

CC The invention relates to a novel isolated polypeptide, designated NOVX 

CC (NOV1 - 33), consisting of a mature form of one of 61 sequences, given in 

CC the specification, or its variant, where amino acid residue (s) in the 

CC variant differ from the mature form, provided that the variant differs in 

CC not more than 15 % of the amino acids from the sequence of the mature 

CC form. The NOVX polypeptides, nucleic acids encoding the polypeptides, and 

CC an antibody to the polypeptides, are useful for treating or preventing a 

CC NOVX-associated disorder in humans and for treating a syndrome associated 

CC with a human disease (NOVX-associated disorder) . NOVX polypeptides and 

CC the encoding nucleic acids, are useful for determining the presence of or 

CC predisposition to a disease associated with altered levels of NOVX 

CC polypeptide and polynucleotide, by measuring the level of polypeptide 

CC expression or the amount of nucleic acid from a mammal and comparing it 

CC with another mammal not having or not predisposed to the disease. NOVX 

CC polypeptide is also useful for identifying an agent that binds to NOVX 

CC and a cell expressing NOVX is useful for identifying an agent that 

CC modulates the expression or activity of NOVX. The antibodies and a 

CC polypeptide having 95 % sequence identity to NOVX polypeptide are useful 

CC for treating a pathological state in a mammal. The antibodies are also 

CC useful for determining the presence or amount of NOVX in a sample. NOVX 

CC polypeptides, polynucleotides and antibodies specific for the 

CC polypeptides are useful for treating or preventing disorders or syndromes 

CC including trauma, viral, bacterial, fungal, protozoal, and parasitic 

CC infections. They can also treat disorders such as e.g., Alzheimer's 

CC disease or a stroke. The NOVX encoding nucleic acids are useful for 

CC expressing the NOVX proteins, to detect NOVX mRNA, or a genetic lesion in 

CC a NOVX gene and to modulate NOVX activity. NOVX sequences are also useful 

CC for identifying a cell or tissue type in a biological sample, to amplify 

CC DNA sequences from very small biological samples such as tissues e.g. 

CC hair or skin or body fluids in forensic biology and as primers and probes 

CC for use in identifying and/or cloning NOVX homologues in other cell 

CC types. The NOVX proteins are useful as an irnmunogen to generate 

CC antibodies which are useful for diagnostically monitoring protein levels 



CC and modulating NOVX activity. Cells comprising NOVX nucleic acids are 
CC useful for producing non-human transgenic animals which are useful for- 
ce studying the function and/or activity of NOVX protein and for identifying 
CC and/or evaluating modulators of NOVX protein activity. The NOVX nucleic 
CC acids can be used in gene therapy. This sequence represents a NOVX 
CC protein of the invention 
XX 

SQ Sequence 1577 AA; 

Query Match 36.2%; Score 1303; DB 6; Length 1577; 

Best Local Similarity 37.1%; Pred. No. 2e-52; 

Matches 237; Conservative 45; Mismatches 207; Indels 150; Gaps 13; 



Qy 94 CCPGFYESGEMCV PHCADKCV— HGRCIA-PNTCQCEPGWGGTNCSSACDG 141 

I III II I I I II 111111:1111 

Db 810 CLPGFV--GSRCQDCEAGWYGPSCQTMCSCANDGHCHQDTGHCSCAPGWTGFSCQRACDT 867 

Qy 142 DHWGPHCTSRCQCKNG-ALCNPITGACHCAAGFRGWRCE-DRCEQGTYGNDCHQRCQCQN 199 

Mill: II I I : I : I I I I I : I I I I I I I : I I I I I I I I : 

Db 868 GHWGPDCSHPCNCSAGHGSCDAISGLCLCEAGYVGPRCEQSECPQGHFGPGCEQRCQCQH 927 



Qy 200 GATCDHVTGECRCPPGYTGAFCED LCPPGK 22 9 

I I I I I I : I I I I I : I I I I III I : 

Db 92 8 GAAC DH VS GACT C P AGWRGT FC EHAC PAGF FGL DCRS ACN CTAGAAC DAVNG S C LC P AGR 98 7 

Qy 230 HGPQ CEQRCPCQNGGVCHHVTGECSCPSGWMGTVCGQPCPEGRFGK 275 

||: I I I ! I I I I I : I I I I I I I I I I I : I 

Db 988 RGPRCAESACPAHTYGHNCSQACACFNGASCDPVHGQCHCAPGWMGPSCLQACPAGLYGD 1047 

Qy 276 NCSQECQCHNGGTCDAA 292 

II I I I I I I I I 

Db 1048 NCRHSCLCQNGGTCDPVSGHCACPEGWAGLACEVECLPRDVRAGCRHSGGCLNGGLCDPH 1107 

Qy 293 TGQCHCS PGYTGERCQDE CPV 313 

I I : I I I : I I : : I I M 

Db 1108 TGRCLCPAGWTGDKCQSPAACAKGTFGPHCEGRCACRWGGPCHLATGACLCPPGWRGPHL 1167 

Qy 314 GTYGVLCAETCQCVNGGKCYHVSGACLCEAGFAGERCEARLCPEGLYGIKCDKR 367 

I : I I I : I I I I : I I : I I I I III I I : I I I : I I s 
Db 1168 SAACLRGWFGEACAQRCSCPPGAACHHVTGACRCPPGFTGSGCE-QACPPGSFGEDCAQM 1226 

Qy 3 68 CPCHLENTHSCHPMSGECACKPGWSGLYCNETCSPGFYGEACQQICSCQNGADCDSVTGK 427 

|| I i : I I I : I I : I I : I I : I I I II I : I : I I i I I I : I I 

Db 1227 CQCPGENP-ACHPATGTCSCAAGYHGPSCQQRCPPGRYGPGCEQLCGCLNGGSCDAATGA 12 85 

Qy 428 CTCAPGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVCSPVDGSCTCKAGWHGVDCSIRCP 487 

II Mill: I I t : I I I : III Ml I III II 

Db 1286 CRCPTGFLGTDCNLTCPQGRFGPNCTHVCGCGQGAACDPVTGTCLCPPGRAGVRCERGCP 1345 

Qy 488 SGTWGFGCNLTCQCLNGGACNTLDGTCTCAPGWRGEKCELPCQDGTYGLNCAERCDCSHA 547 

: I II II I I II I : : I M M Ml II I I III I I I : 

Db 1346 QNRFGVGCEHTCSCRNGGLCHASNGSCSCGLGWTGRHCELACPPGRYGAACHLECSCHNN 1405 



Qy 548 DGCHPTTGHCRCLPGWSGVHCDSVCAEGRWGPNCSLPCY 586 

I I I I M I I I : I I : I 1 I I I : 
Db 1406 STCEPATGTCRCGPGFYGQACEHPCPPGFHGAGCQGLCW 1444 



RESULT 13 




ABJ37900 




ID 


ABJ37900 standard; protein; 1398 AA. 


XX 






AC 


ABJ37900; 




XX 






DT 


22-MAY-2003 


(first entry) 


XX 






DE 


NOVX protein sequence SEQ ID No 46. 


XX 






KW 


Hepatotropic 


; immunosuppressive; cardiant; hypertensive; tranquilizer; 


KW 


vulnerary; virucide; antibacterial; protozoacide ; fungicide; nootropics- 


KW 


antiparasitic; neuroprotective; cerebroprotective; antiparkinsonian; 


KW 


anticonvulsant; antiaddictive ; analgesic; dermatological ; keratolytic; 


KW 


ant i seborrheic; antirheumatic; antiarthritic; antiinflammatory; anti-HIV; 


KW 


cytostatic- 


antiasthmatic; antipsoriatic ; hypotensive ; osteopathics- 


KW 


antiulcer; anorectic; antidiabetic; antiallergic; haemostatic; 


KW 


neuroleptic; 


antidepressant; antiinfertility; NOVX; human disease; 


KW 


NOVX-associated disorder; trauma; viral; bacterial; fungal; protozoal; 


KW 


parasitic infection; Alzheimer's disease; stroke; forensic biology; 


KW 


immunogen; non-human transgenic animal; gene therapy. 


XX 






OS 


Unidentified. 


XX 






PN 


WO200281517- 


A2. 


XX 






PD 


17-OCT-2002 . 




XX 






PF 


22-JAN-2002; 


2002WO-US002064 . 


XX 






PR 


19-JAN-2001; 


2001US-0262892P. 


PR 


23-JAN-2001; 


2001US-0263598P. 


PR 


24-JAN-2001; 


2001US-0263799P. 


PR 


25- JAN-2001; 


2001US-0264117P. 


PR 


25-JAN-2001; 


2001US-0264139P. 


PR 


26-JAN-2001, 


2001US-0264478P. 


PR 


30-JAN-2001, 


2001US-0263351P. 


PR 


02-MAR-2001, 


2001US-0272870P. 


PR 


14-MAR-2001, 


2001US-0275927P. 


PR 


14-MAR-2001, 


2001US-0275990P. 


PR 


15-MAR-2001, 


2001US-0276449P. 


PR 


2 0-MAR-2 001, 


2001US-0277358P. 


PR 


23-MAR-2001, 


2001US-0278151P. 


PR 


29-MAR-2001, 


2001US-0279857P. 


PR 


20-APR-2001, 


2001US-0285140P. 


PR 


20-APR-2001 


2001US-0285141P. 


PR 


30-APR-2001 


2001US-0287484P. 


PR 


17-MAY-2001 


• 2001US-0291701P. 


PR 


08-JUN-2001 


■ 2001US-0296960P. 


PR 


10-JUL-2001 


; 2001US-0304353P. 


PR 


10-JUL-2001 


; 2001US-0304355P. 


PR 


12-JUL-2001 


; 2001US-0304886P. 


PR 


09-AUG-2001 


; 2001US-0311289P . 


PR 


13-AUG-2001 


; 2001US-0311975P. 


PR 


16-AUG-2001 


; 2001US-0312937P. 


PR 


18-OCT-2001 


; 2001US-0330227P. 



PR 29-NOV-2001; 2001US-0334 198P . 
XX 

PA (CURA-) CURAGEN CORP. 
XX 

PI Decristofaro MF, Padigaru M, Miller C, Tchernev V, Zhong H; 

PI Zhong M, Anderson D , Ballinger R, Gerlach V, Spytek KA, Rastelli L; 

PI Kekuda R, Guo X, Zerhusen B, Andrew D f Mezes P, Patturajan M; 

PI Burgess CE, Eisen A, Wolenc A, Baumgartner J, Shimkets RA, Gusev V; 

PI Vernet CAM, Taupier RJ, Pena C, Shenoy S, Li L, Casman S, Boldog F; 

PI Fernandes E, Smithson G, Malyankar U, Taillon B, Liu X; 

XX 

DR WPI; 2003-058504/05. 

DR N-PSDB; ABT33365. 

XX 

PT New polypeptides, designated as NOVX, useful for diagnosing and treating 

PT infections, neurological diseases, cancer, allergy, and bone, 

PT immunological, skin, renal, brain, muscle and autoimmune disorders. 

XX 

PS Claim 1; Page 127; 672pp; English. 
XX 

CC The invention relates to a novel isolated polypeptide, designated NOVX 

CC (N0V1 - 33), consisting of a mature form of one of 61 sequences, given in 

CC the specification, or its variant, where amino acid residue (s) in the 

CC variant differ from the mature form, provided that the variant differs in 

CC not more than 15 % of the amino acids from the sequence of the mature 

CC form. The NOVX polypeptides, nucleic acids encoding the polypeptides, and 

CC an antibody to the polypeptides, are useful for treating or preventing a 

CC NOVX-associated disorder in humans and for treating a syndrome associated 

CC with a human disease (NOVX-associated disorder) . NOVX polypeptides and 

CC the encoding nucleic acids, are useful for determining the presence of or 

CC predisposition to a disease associated with altered levels of NOVX 

CC polypeptide and polynucleotide, by measuring the level of polypeptide 

CC expression or the amount of nucleic acid from a mammal and comparing it 

CC with another mammal not having or not predisposed to the disease. NOVX 

CC polypeptide is also useful for identifying an agent that binds to NOVX 

CC and a cell expressing NOVX is useful for identifying an agent that 

CC modulates the expression or activity of NOVX. The antibodies and a 

CC polypeptide having 95 % sequence identity to NOVX polypeptide are useful 

CC for treating a pathological state in a mammal. The antibodies are also 

CC useful for determining the presence or amount of NOVX in a sample. NOVX 

CC polypeptides, polynucleotides and antibodies specific for the 

CC polypeptides are useful for treating or preventing disorders or syndromes 

CC including trauma, viral, bacterial, fungal, protozoal, and parasitic 

CC infections. They can also treat disorders such as e.g., Alzheimer's 

CC disease or a stroke. The NOVX encoding nucleic acids are useful for 

CC expressing the NOVX proteins, to detect NOVX mRNA, or a genetic lesion in 

CC a NOVX gene and to modulate NOVX activity. NOVX sequences are also useful 

CC for identifying a cell or tissue type in a biological sample, to amplify 

CC DNA sequences from very small biological samples such as tissues e.g. 

CC hair or skin or body fluids in forensic biology and as primers and probes 

CC for use in identifying and/or cloning NOVX homologues in other cell 

CC types. The NOVX proteins are useful as an immunogen to generate 

CC antibodies which are useful for diagnos tically monitoring protein levels 

CC and modulating NOVX activity. Cells comprising NOVX nucleic acids are 

CC useful for producing non-human transgenic animals which are useful for 

CC studying the function and/or activity of NOVX protein and for identifying 

CC and/or evaluating modulators of NOVX protein activity. The NOVX nucleic 



CC acids can be used in gene therapy. This sequence represents a NOVX 

CC protein of the invention 

XX 

SQ Sequence 1398 AA; 

Query Match 36.1%; Score 1300; DB 6; Length 1398; 

Best Local Similarity 41.8%; Pred. No. 2.5e-52; 

Matches 228; Conservative 45; Mismatches 217; Indels 56; Gaps 11; 

Qy 90 RKSQCCPGFYESGEMCVPHCADKCVH-GRCIA-PNTCQCEPGWGGTNCSSACDGDHWGPH 147 

I 1 I : I | I I : I : I I I I III I :| III I I I I 

Db 8 08 RCQDCEAGWY — GPSCQTMCS — CANDGHCHQDTGHCSCAPGWTGFSCQRACDTGHWGPD 8 63 

Qy 148 CTSRCQCKNG-ALCNPITGACHCAAGFRGWRCE-DRCEQGTYGNDCHQRCQCQNGATCDH 205 

I : II I I : I : I I I I I : I I I I I I I : I I I I I I I I : I I I I I 

Db 864 CSHPCNCSAGHGSCDAISGLCLCEAGYVGPRCEQSECPQGHFGPGCEQRCQCQHGAACDH 923 

Qy 206 VTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPCQNGGVCHHVTGECSCPSGWMGTVCG 265 

I : I I I i I : I I I I III I I II I I I I I I I : I I I 
Db 92 4 VSGACTCPAGWRGTFCEHACPAGFFGLDCRSACNCTAGAACDAVNGSCLCPAGRRGPRCA 98 3 

Qy 2 66 Q-PCPEGRFGKNCSQECQCHNGGTCDAATGQCHCSPGYTGERCQDECPVGTYGVLCAETC 32 4 

: II : I I I I I I I I I : I I I I I I I : I I : I I I I I I I I : I 

Db 9 84 ESACPAHTYGHNCSQACACFNGASCDPVHGQCHCAPGWMGPSCLQACPAGLYGDNCRHSC 104 3 

Qy 325 QCVNGGKCYHVSGACLCEAGFAGERCEAR LC 355 

I I I I I I II I I I : I I I I M 
Db 1044 LCQNGGTCDPVSGHCACPEGWAGLACEVECLPRDVRAGCRHSGGCLNGGLCDPHTGRCLC 1103 

Qy 356 P EGLYGIKCDKRCPCHLENTHSCHPMSGECACKPGWSGLYCNETC 400 

I : I : I I : I I I I I : I I I I I I I : : I 

Db 1104 PAGWTGDKCQS PAACAKGT FGPHCEGRCACRWGG- - PCHLATGACLCP PGWRGPHLS AAC 1161 

Qy 401 SPGFYGEACQQICSCQNGADCDSVTGKCTCAPGFKGIDCSTPCPLGTYGINCSSRCGCKN 460 

I :: I I I I I I I I II I I I I I I I I I I I I j I : : I : I : II 
Db 1162 LRGWFGEACAQRCSCPPGAACHHVTGACRCPPGFTGSGCEQACPPGSFGEDCAQMCQCPG 1221 

Qy 461 D-AVCSPVDGSCTCKAGWHGVDCSIRCPSGTWGFGCNLTCQCLNGGACNTLDGTCTCAPG 519 

: || | : | : | | | : | I I I I I I : I I I I I I I I I : I : III I 
Db 1222 ENPACHPATGTCSCAAGYHGPSCQQRCPPGRYGPGCEQLCGCLNGGSCDAATGACRCPTG 12 81 

Qy 52 0 WRGEKCELPCQDGTYGLNCAERCDCSHADGCHPTTGHCRCLPGWSGVHCDSVCAEGRWGP 57 9 

: I I I I I : I I I II | | | | I i I I : I I I : i : I : I 

Db 1282 FLGTDCNLTCPQGRFGPNCTHVCGCGQGAACDPVTGTCLCPPGRAGVRCERGCPQNRFGV 1341 

Qy 580 NCSLPC 585 

I I 

Db 1342 GCEHTC 1347 



RESULT 14 
ABJ37899 

ID ABJ37899 standard; protein; 1404 AA. 
XX 

AC ABJ37899; 
XX 

DT 22-MAY-2003 (first entry) 



XX 

DE NOVX protein sequence SEQ ID No 44. 
XX 

KW Hepatotropic; immunosuppressive; cardiant; hypertensive; tranquilizer; 

KW vulnerary; virucide; antibacterial; protozoacide ; fungicide; nootropic; 

KW antiparasitic; neuroprotective; cerebroprotective ; antiparkinsonian; 

KW anticonvulsant; antiaddictive ; analgesic; derrnatological ; keratolytic; 

KW antiseborrheic; antirheumatic; antiarthritic; antiinflammatory; anti-HIV; 

KW cytostatic; antiasthmatic; antipsoriatic ; hypotensive; osteopathic; 

KW antiulcer; anorectic; antidiabetic; antiallergic; haemostatic; 

KW neuroleptic; antidepressant; antiinf ertility ; NOVX; human disease; 

KW NOVX-associated disorder; trauma; viral; bacterial; fungal; protozoal; 

KW parasitic infection; Alzheimer's disease; stroke; forensic biology; 

KW immunogen; non-human transgenic animal; gene therapy. 

XX 

OS Unidentified. 
XX 

PN WO200281517-A2 . 
XX 

PD 17-OCT-2002 . 
XX 

PF 22-JAN-2002; 2 0 02WO-US 0 02 0 64 . 
XX 

PR 19-JAN-2001; 2001US-0262892P . 

PR 23-JAN-2001; 2001US-0263598P . 

PR 24-JAN-2001; 2001US-0263799P . 

PR 25-JAN-2001; 2001US-0264117P . 

PR 25-JAN-2001; 2 001US-02 64 139P . 

PR 26-JAN-2001; 2001US-0264478P . 

PR 30-JAN-2001; 2 001US-02 63351P . 

PR 02-MAR-2 001; 2 001US-0272 870P . 

PR 14-MAR-2001; 2 0 0 1US-027 5 927 P . 

PR 14-MAR-2001; 2 001US-0275990P . 

PR 15-MAR-2001; 2 001US-027 644 9P . 

PR 20-MAR-2001; 2 0 0 1US-02 7 7 358 P . 

PR 23-MAR-2001; 2 0 0 1US-02 7 8 151 P . 

PR 29-MAR-2001; 2 0 0 1US-02 7 9 8 57 P . 

PR 20-APR-2001; 2001US-0285140P . 

PR 20-APR-2001; 2 001US-02 85141P . 

PR 30-APR-2001; 2 001US-02 874 84P . 

PR 17-MAY-2001; 2001US-0291701P . 

PR 08-JUN-2001; 2 001US-02 969 60P . 

PR 10-JUL-2001; 2001US-0304353P . 

PR 10-JUL-2001; 2001US-0304355P . 

PR 12-JUL-2001; 2001US-03 04 8 8 6P . 

PR 09-AUG-2001; 2 001US-03112 8 9P . 

PR 13-AUG-2001; 2 001US-0311975P . 

PR 16-AUG-2001; 2 001US-0312 937P . 

PR 18-OCT-2001; 2 001US-0330227P . 

PR 2 9-NOV-2 0 01; 2 001US-0334 198P . 
XX 

PA (CURA-) CURAGEN CORP. 
XX 

PI Decristofaro MF, Padigaru M, Miller C, Tchernev V, Zhong H; 

PI Zhong M, Anderson D, Ballinger R, Gerlach V, Spytek KA, Rastelli L; 

PI Kekuda R, Guo X, Zerhusen B, Andrew D f Mezes P, Patturajan M; 

PI Burgess CE, Eisen A, Wolenc A, Baumgartner J, Shimkets RA, Gusev V; 



PI Vernet CAM, Taupier RJ, Pena C, Shenoy S, Li L, Casrnan S, Boldog F; 

PI Fernandes E, Smithson G, Malyankar U, Taillon B, Liu X; 

XX 

DR WPI; 2003-058504/05. 

DR N-PSDB; ABT33364. 
XX 

PT New polypeptides, designated as NOVX, useful for diagnosing and treating 

PT infections, neurological diseases, cancer, allergy, and bone, 

PT immunological, skin, renal, brain, muscle and autoimmune disorders. 

XX 

PS Claim 1; Page 124; 672pp; English. 
XX 

CC The invention relates to a novel isolated polypeptide, designated NOVX 

CC (NOV1 - 33) , consisting of a mature form of one of 61 sequences, given in 

CC the specification, or its variant, where amino acid residue (s) in the 

CC variant differ from the mature form, provided that the variant differs in 

CC not more than 15 % of the amino acids from the sequence of the mature 

CC form. The NOVX polypeptides, nucleic acids encoding the polypeptides, and 

CC an antibody to the polypeptides, are useful for treating or preventing a 

CC NOVX-associated disorder in humans and for treating a syndrome associated 

CC with a human disease (NOVX-associated disorder) . NOVX polypeptides and 

CC the encoding nucleic acids, are useful for determining the presence of or 

CC predisposition to a disease associated with altered levels of NOVX 

CC polypeptide and polynucleotide, by measuring the level of polypeptide 

CC expression or the amount of nucleic acid from a mammal and comparing it 

CC with another mammal not having or not predisposed to the disease. NOVX 

CC polypeptide is also useful for identifying an agent that binds to NOVX 

CC and a cell expressing NOVX is useful for identifying an agent that 

CC modulates the expression or activity of NOVX. The antibodies and a 

CC polypeptide having 95 % sequence identity to NOVX polypeptide are useful 

CC for treating a pathological state in a mammal. The antibodies are also 

CC useful for determining the presence or amount of NOVX in a sample. NOVX 

CC polypeptides, polynucleotides and antibodies specific for the 

CC polypeptides are useful for treating or preventing disorders or syndromes 

CC including trauma, viral, bacterial, fungal, protozoal, and parasitic 

CC infections. They can also treat disorders such as e.g., Alzheimer's 

CC disease or a stroke. The NOVX encoding nucleic acids are useful for 

CC expressing the NOVX proteins, to detect NOVX rnRNA, or a genetic lesion in 

CC a NOVX gene and to modulate NOVX activity. NOVX sequences are also useful 

CC for identifying a cell or tissue type in a biological sample, to amplify 

CC DNA sequences from very small biological samples such as tissues e.g. 

CC hair or skin or body fluids in forensic biology and as primers and probes 

CC for use in identifying and/or cloning NOVX homologues in other cell 

CC types. The NOVX proteins are useful as an immunogen to generate 

CC antibodies which are useful for diagnos tically monitoring protein levels 

CC and modulating NOVX activity. Cells comprising NOVX nucleic acids are 

CC useful for producing non-human transgenic animals which are useful for 

CC studying the function and/or activity of NOVX protein and for identifying 

CC and/or evaluating modulators of NOVX protein activity. The NOVX nucleic 

CC acids can be used in gene therapy. This sequence represents a NOVX 

CC protein of the invention 

XX 

SQ Sequence 1404 AA; 



Query Match 36.1%; Score 1300; DB 6; Length 1404; 

Best Local Similarity 41.8%; Pred. No. 2.5e-52; 

Matches 228; Conservative 45; Mismatches 217; Indels 56; Gaps 11 



Qy 90 RKSQCCPGFYESGEMCVPHCADKCVH-GRCIA-PNTCQCEPGWGGTNCSSACDGDHWGPH 147 

I I I : I I I I : I : I I I I III I :| III MM 

Db 814 RCQDCEAGWY--GPSCQTMCS--CANDGHCHQDTGHCSCAPGWTGFSCQRACDTGHWGPD 8 69 

Qy 14 8 CTSRCQCKNG-ALCNPITGACHCAAGFRGWRCE-DRCEQGTYGNDCHQRCQCQNGATCDH 2 05 

! : II f I : I : I I I I I : I II I I I I : I I I I I I I I : I I I I I 
Db 870 CSHPCNCSAGHGSCDAISGLCLCEAGYVGPRCEQSECPQGHFGPGCEQRCQCQHGAACDH 929 

Qy 2 06 VTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPCQNGGVCHHVTGECSCPSGWMGTVCG 2 65 

I : I I I I I : I I II I I I I I I I I I I I I I I : I I I 
Db 930 VSGACTCPAGWRGTFCEHACPAGFFGLDCRSACNCTAGAACDAVNGSCLCPAGRRGPRCA 98 9 

Qy 2 66 Q-PCPEGRFGKNCSQECQCHNGGTCDAATGQCHCSPGYTGERCQDECPVGTYGVLCAETC 32 4 

: II : I I I I I I I I I : I I I I I I I : I I : I I I I I I I I : I 

Db 990 ESAC PAHTYGHNCSQACAC FNGASCDPVHGQCHCAPGWMGPSCLQAC PAGLYGDNCRHSC 1049 

Qy 325 QCVNGGKCYHVSGACLCEAGFAGERCEAR LC 355 

I I I I I II I I I I : I I I I II 
Db 10 5 0 LCQNGGTCDPVSGHCACPEGWAGLACEVECLPRDVRAGCRHSGGCLNGGLCDPHTGRCLC 1109 

Qy 35 6 P ■ EGLYGIKCDKRCPCHLENTHSCHPMSGECACKPGWSGLYCNETC 4 00 

I : | : | | : | | I I I : I I I I I I I :: I 

Db 1110 PAGWTGDKCQSPAACAKGTFGPHCEGRCACRWGG — PCHLATGACLCPPGWRGPHLSAAC 1167 

Qy 401 SPGFYGEACQQICSCQNGADCDSVTGKCTCAPGFKGIDCSTPCPLGTYGINCSSRCGCKN 460 

I :: I I I I I I I I III I I I I I I I I I I | | | : : | : | : I I 
Db 1168 LRGWFGEACAQRCSCPPGAACHHVTGACRCPPGFTGSGCEQACPPGSFGEDCAQMCQCPG 1227 

Qy 461 D-AVCSPVDGSCTCKAGWHGVDCSIRCPSGTWGFGCNLTCQCLNGGACNTLDGTCTCAPG 519 

: II | : I : I I I : 1 I I I I I I : I I I I I I I I I : I : III I 
Db 1228 ENPACHPATGTCSCAAGYHGPSCQQRCPPGRYGPGCEQLCGCLNGGSCDAATGACRCPTG 1287 

Qy 52 0 WRGEKCELPCQDGTYGLNCAERCDCSHADGCHPTTGHCRCLPGWSGVHCDSVCAEGRWGP 57 9 

: I I I I I : I I I II I I I I ! I I I : I I I : I : I : I 

Db 1288 FLGTDCNLTCPQGRFGPNCTHVCGCGQGAACDPVTGTCLCPPGRAGVRCERGCPQNRFGV 1347 

Qy 580 NCSLPC 585 

I I 

Db 1348 GCEHTC 1353 



RESULT 15 


ABJ37901 


ID 


ABJ37901 standard; protein; 1450 AA. 


XX 




AC 


ABJ37901; 


XX 




DT 


22-MAY-2003 (first entry) 


XX 




DE 


NOVX protein sequence SEQ ID No 48. 


XX 




KW 


Hepatotropic; immunosuppressive; cardiant; hypertensive; tranquilizer; 


KW 


vulnerary; virucide; antibacterial; protozoacide ; fungicide; nootropics- 


KW 


antiparasitic; neuroprotective; cerebroprotective; antiparkinsonians- 


KW 


anticonvulsant; antiaddictive; analgesic; dermatological ; keratolytic; 


KW 


antiseborrheic; antirheumatic; antiarthritic; antiinflammatory; anti-HIV; 



KW cytostatic; antiasthmatic; antipsoriatic; hypotensive; osteopathic; 

KW antiulcer; anorectic; antidiabetic; antiallergic; haemostatic; 

KW neuroleptic; antidepressant; antiinf ertili ty ; NOVX; human disease; 

KW NOVX-associated disorder; trauma; viral; bacterial; fungal; protozoal; 

KW parasitic infection; Alzheimer's disease; stroke; forensic biology; 

KW immunogen; non-human transgenic animal; gene therapy. 

XX 

OS Unidentified. 
XX 

PN WO200281517-A2 . 
XX 

PD 17-OCT-2002. 
XX 

PF 22-JAN-2002; 2 002WO-US 002 064 . 
XX 

PR 19-JAN-2001; 2 0 0 1US- 02 62 8 92 P . 

PR 23-JAN-2001; 2 001US-02 63598P . 

PR 24-JAN-2001; 2001US-0263799P . 

PR 25-JAN-2001; 2 001US-02 64 117P . 

PR 25-JAN-2001; 2 001US-02 64 139P . 

PR 26-JAN-2001; 2001US-0264478P . 

PR 30-JAN-2001; 2 0 0 1US-02 63 35 IP . 

PR 02-MAR-2001; 2001US-0272870P . 

PR 14-MAR-2001; 2 001US-02 75927P . 

PR 14-MAR-2001; 2 001US-0275 990P . 

PR 15-MAR-2001; 2001US-0276449P . 

PR 20-MAR-2001; 2 001US-02 77358P . 

PR 23-MAR-2001; 2 001US-0278 151P . 

PR 29-MAR-2001; 2 0 0 1US-02 7 9 8 57 P . 

PR 20-APR-2001; 2001US-02 85140P . 

PR 20-APR-2001; 2001US-02 85141P . 

PR 30-APR-2001; 2 001US-02 87 4 84P . 

PR 17-MAY-2001; 2 001US-02 917 01P . 

PR 08-JUN-2001; 2 001US-02 96960P . 

PR 10-JUL-2001; 2 0 0 1US- 03 04 3 53P . 

PR 10-JUL-2001; 2 001US-0304 355P . 

PR 12-JUL-2001; 2 001US-0304 8 86P . 

PR 09-AUG-2001; 2001US-0311289P . 

PR 13-AUG-2001; 2 001US-03 11975P . 

PR 16-AUG-2001; 2 0 01US- 03 12 937 P . 

PR 18-OCT-2001; 2 001US-0330227P . 

PR 29-NOV-2001; 2 001US-0334 198P . 
XX 

PA (CURA-) CURAGEN CORP. 
XX 

PI Decristofaro MF, Padigaru M, Miller C, Tchernev V, Zhong H; 

PI Zhong M, Anderson D, Ballinger R, Gerlach V, Spytek KA, Rastelli L; 

PI Kekuda R, Guo X, Zerhusen B, Andrew D, Mezes P, Patturajan M; 

PI Burgess CE, Eisen A, Wolenc A, Baumgartner J, Shimkets RA, Gusev V; 

PI Vernet CAM, Taupier RJ, Pena C, Shenoy S, Li L, Casman S f Boldog F; 

PI Fernandes E, Smithson G, Malyankar U, Taillon B, Liu X; 

XX 

DR WPI; 2003-058504/05. 

DR N-PSDB; ABT33366. 
XX 

PT New polypeptides, designated as NOVX, useful for diagnosing and treating 

PT infections, neurological diseases, cancer, allergy, and bone, 



PT immunological, skin, renal, brain, muscle and autoimmune disorders. 
XX 

PS Claim 1; Page 129-130; 672pp; English. 
XX 

CC The invention relates to a novel isolated polypeptide, designated NOVX 

CC (NOV1 - 33) , consisting of a mature form of one of 61 sequences, given in 

CC the specification, or its variant, where amino acid residue (s) in the 

CC variant differ from the mature form, provided that the variant differs in 

CC not more than 15 % of the amino acids from the sequence of the mature 

CC form. The NOVX polypeptides, nucleic acids encoding the polypeptides, and 

CC an antibody to the polypeptides, are useful for treating or preventing a 

CC NOVX-associated disorder in humans and for treating a syndrome associated 

CC with a human disease (NOVX-associated disorder) . NOVX polypeptides and 

CC the encoding nucleic acids, are useful for determining the presence of or 

CC predisposition to a disease associated with altered levels of NOVX 

CC polypeptide and polynucleotide, by measuring the level of polypeptide 

CC expression or the amount of nucleic acid from a mammal and comparing it 

CC with another mammal not having or not predisposed to the disease. NOVX 

CC polypeptide is also useful for identifying an agent that binds to NOVX 

CC and a cell expressing NOVX is useful for identifying an agent that 

CC modulates the expression or activity of NOVX. The antibodies and a 

CC polypeptide having 95 % sequence identity to NOVX polypeptide are useful 

CC for treating a pathological state in a mammal. The antibodies are also 

CC useful for determining the presence or amount of NOVX in a sample. NOVX 

CC polypeptides, polynucleotides and antibodies specific for the 

CC polypeptides are useful for treating or preventing disorders or syndromes 

CC including trauma, viral, bacterial, fungal, protozoal, and parasitic 

CC infections. They can also treat disorders such as e.g., Alzheimer's 

CC disease or a stroke. The NOVX encoding nucleic acids are useful for 

CC expressing the NOVX proteins, to detect NOVX mRNA, or a genetic lesion in 

CC a NOVX gene and to modulate NOVX activity. NOVX sequences are also useful 

CC for identifying a cell or tissue type in a biological sample, to amplify 

CC DNA sequences from very small biological samples such as tissues e.g. 

CC hair or skin or body fluids in forensic biology and as primers and probes 

CC for use in identifying and/or cloning NOVX homologues in other cell 

CC types. The NOVX proteins are useful as an immunogen to generate 

CC antibodies which are useful for diagnos tically monitoring protein levels 

CC and modulating NOVX activity. Cells comprising NOVX nucleic acids are 

CC useful for producing non-human transgenic animals which are useful for 

CC studying the function and/or activity of NOVX protein and for identifying 

CC and/or evaluating modulators of NOVX protein activity. The NOVX nucleic 

CC acids can be used in gene therapy. This sequence represents a NOVX 

CC protein of the invention 

XX 

SQ Sequence 1450 AA; 

Query Match 36. 0%; Score 1297.5; DB 6; Length 1450; 

Best Local Similarity 43.2%; Pred. No. 3.3e-52; 

Matches 221; Conservative 47; Mismatches 216; Indels 27; Gaps 6; 

Qy 94 CCPGFYESGEMCVPHCADKCVHG RCIAPNT CQCEPGWGGTNCS 136 

Mill I I I I II I I I I II I : I 

Db 800 CLPGFVGS RCQDVCPAGWYGPSCQTRCSCANDGHCHPATGHCSCAPGWTGFSCQ 853 

Qy 137 SACDGDHWGPHCTSRCQCKNG-ALCNPITGACHCAAGFRGWRCEDRCEQGTYGNDCHQRC 195 

Ml Mill: II I I : I : I I I I I : I I I I : I II : I III 
Db 854 RACDTGHWGPDCSHPCNCSAGHGSCDAISGLCLCEAGYVGPRCEQQCPQGHFGPGCEQLC 913 



Qy 196 QCQNGATCDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPCQNGGVCHHVTGECSC 255 

I I I : I I I I I I : I I I I I : I I I I 111 I I I i I I I I I I 
Db 914 QCQHGAACDHVSGACTCPAGWRGTFCEHACPAGFFGLDCRSACNCTAGAACDAVNGSCLC 973 

Qy 256 PSGWMGTVCGQPCPEGRFGKNCSQECQCHNGGTCDAATGQCHCSPGYTGERCQDECPVGT 315 

I : I I I : I I I : I I I I I [Mill : I I I hi I : I I I 
Db 974 PAGRRGPRCAETCPAGLYGDNCRHSCLCQNGGTCDPVSGHCACPEGWAGLACEKECPPRD 1033 

Qy 316 YGVLCAETCQCVNGGKCYHVSGACLCEAGFAGERCEARLCPEGLYGIKCDKRCPCHLENT 375 

I : hllll : I | | | I I : I I : : I : : I I I I : I I 
Db 1034 VRAGCRHSGGCLNGGLCDPHTGRCLCPAGWAGDKCQSP-CLRGWPGEACAQHCSC — PPG 1090 

Qy 376 HSCHPMSGECACKPGWSGLYCNETCSPGFYGEACQQICSCQNGADCDSVTGKCTCAPGFK 435 

: I I : : I I i I I : : I I : I II I I I : I : I I I I I I : I I I I II 
Db 1091 AACHHVTGACRCPPGFTGSGCEQGCPPGRYGPGCEQLCGCLNGGSCDAATGACRCPTGFL 1150 

Qy 436 GIDCSTPCPLGTYGINCSSRCGCKNDAVCSPVDGSCTCKAGWHGVDCSIRCPSGTWGFGC 495 

III: I I I : I I I : III I I I I I : I I I III II : I I I 
Db 1151 GTDCNLTCPQGRFGPNCTHVCGCGQGAACDPVTGTCLCPPGRAGVRCERGCPQNRFGVGC 1210 

Qy 4 96 NLTCQCLNGGACNTLDGTCTCAPGWRGEKCELPCQDGTYGLNCAERCDCSHADGCHPTTG 5 55 

I I I I I I I : : I : I : I III I I I I III I I I : III 
Db 1211 EHTCSCRNGGLCHASNGSCSCGLGWTGRHCELACPPGRYGAACHLECSCHNNSTGEPATG 1270 

Qy 556 HCRCLPGWSGVHCDSVCAEGRWGPNCSLPCY 586 

I I I I I : I i : I I I I I : 
Db 12 71 TCRCGPGFYGQACEHPCPPGFHGAGCQGLCW 13 01 

Search completed: March 26, 2004, 16:08:54 
Job time : 38.6674 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM protein - protein search, using sw model 
Run on : 



Title: 
Perfect score: 3601 



March 26, 2004, 16:06:56 ; Search time 13.241 Seconds 

(without alignments) 
2284.780 Million cell updates/sec 

US-10-092-390-4 



Sequence : 



1 MVISLNSCLSFICLLLCHWI HCDSVCAEGRWGPNCSLPCY 58 6 



Scoring table: BLOSUM62 

Gapop 10.0 , Gapext 0.5 

Searched: 389414 seqs, 51625971 residues 

Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 



389414 



Post-processing : 



Database 



Minimum Match 0% 
Maximum Match 10 0% 
Listing first 45 summaries 

Issued Patents_AA:* 

1 : /cgn2_6/ptodata/2/iaa/5A__C0MB.pep: * 

2 : /cgn2_6/ptodata/2/iaa/5B_COMB.pep: * 

3 : / cgn2_6/ptodata/2 /iaa/ 6A_C0MB . pep : * 

4 : /cgn2_6/ptodata/2/iaa/6B_C0MB.pep: * 

5: /cgn2_6/ptodata/2/iaa/PCTUS_COMB.pep: * 

6 : / cgn2_6/ptodata/2/ iaa/backfilesl. pep : * 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUMMARIES 



Result 
No. 


Score 


Query 
Match 


Length 


DB 


ID 
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1 
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3 
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Sequence 


332, App 
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779 
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779 


21.6 


299 
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Sequence 
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4 


759 
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188 
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Sequence 
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5 


719 
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Sequence 


18, Appl 
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Sequence 
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Sequence 
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Sequence 


10, Appl 


9 


678 
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08- 
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Sequence 


10, Appl 


10 


676 
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08- 
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Sequence 


19, Appl 
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676 


18.8 
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08- 


899 


-232-4 


Sequence 


4, Appli 



12 


666.5 


18 


.5 


2471 


1 


US-08-185-432-16 


Sequence 


16, 


Appl 


13 
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18 


. 5 


2471 


1 


US-08-083-590A-19 


Sequence 


19, 
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3 


US-08-532-384-19 


Sequence 


19, 
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US-08-899-232-1 


Sequence 


1, 
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US-08-185-432-17 


Sequence 


17, 
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17 
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.4 
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4 


US-08-899-232-2 


Sequence 


2, 
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18 
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18 


.4 
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1 


US-08-083-590A-20 


Sequence 


20, 
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19 
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18 


4 
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3 


US-08-532-384-20 


Sequence 


20, 
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20 
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18 
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4 


US-09-467-997-1 


Sequence 


1, 
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21 
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17 


9 
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4 


US-09-796-575-2 


Sequence 


2, 
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22 


636 


17 


.7 
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4 


US-08-793-273C-4 
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4, 
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5 


PCT-US95-11684-4 
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4, 
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2, 
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5 
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2, 
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26 
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3 


US-08-882-046-4 


Sequence 


4, 
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27 
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6 
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3 


US-09-214-278-2 
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2, 
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28 
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4 


US-09-855-722-2 


Sequence 
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29 
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2 


US-08-400-159-8 
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8, 
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30 
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US-09-214-278-3 
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3, 
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4 
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3, 
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US-08-611-729A-8 
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US-08-882-046-7 


Sequence 


7, 
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34 
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US-09-068-740A-6 


Sequence 
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US-09-068-740A-7 


Sequence 


7, 
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4 


US-09-199-865-1 


Sequence 


1, 
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37 
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2 


US-08-400-159-6 


Sequence 


6, 
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38 
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17 
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3 


US-08-611-729A-6 


Sequence 
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39 
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Sequence 


2, 


Appli 
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17 
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1218 


4 


US-09-068-740A-11 


Sequence 


11, 
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1238 
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US-09-214-278-5 
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5, 
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4 


US-09-855-722-5 
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Appli 
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17. 
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US-08-882-046-6 


Sequence 


6, 


Appli 


44 
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17. 
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1218 
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US-09-214-278-7 


Sequence 
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45 


619 


17 . 


2 


1218 


4 


US-09-855-722-7 


Sequence 


7, 


Appli 



ALIGNMENTS 



RESULT 1 

US-09-188-930-332 

; Sequence 332, Application US/09188930A 
; Patent No. 6150502 
; GENERAL INFORMATION: 

APPLICANT: Watson, James D. 
; APPLICANT: Strachan, Lorna 
; APPLICANT: Sleeman, Matthew 

APPLICANT: Onrust, Rene 
; APPLICANT: Murison, James Greg 

; TITLE OF INVENTION: Compositions Isolated From Skin Cells 
TITLE OF INVENTION: and Methods For Their Use 
FILE REFERENCE: 11000. lOllcl 

; CURRENT APPLICATION NUMBER: US/ 09/ 1 8 8 , 930A 

; CURRENT FILING DATE: 1998-11-09 
NUMBER OF SEQ ID NOS : 348 

SOFTWARE: Fas t SEQ for Windows Version 3.0 
; SEQ ID NO 332 



LENGTH: 2 99 
TYPE: PRT 
ORGANISM: Mouse 
US-09-188-930-332 

Query Match 21.6%; Score 779; DB 3; Length 299; 

Best Local Similarity 39.9%; Pred. No. 7e-43; 

Matches 127; Conservative 27; Mismatches 118; Indels 46; Gaps 1; 

Qy 165 GACHCAAGFRGWRCEDRCEQGTYGNDCHQRCQCQNGATCDHVTGECRCPPGYTGAFCEDL 224 

111:11111 I I I I : I I II I! II hi I MM II II 
Db 4 GACYCPAGFLGADCSLACPQGRFGPSCAHVCTCGQGAACDPVSGTCICPPGKTGGHCERG 63 

Qy 22 5 CPPGKHGPQCEQRCPCQNGGVCHHVTGECSCPSGWMGTVCGQPCPEGRFGKNCSQECQCH 284 

II : I I I : I I : I I I : I I I I I I I I I I I I III: I Ml 

Db 64 CPQDRFGKGCEHKCACRNGGLCHATNGSCSCPLGWMGPHCEHACPAGRYGAACLLECSCQ 12 3 

Qy 285 NGGTCDAATGQCHCSPGYTGERCQDECPVGTYGVLCAETCQCVNGGKCYHVSGACLCEAG 344 

MM: M I I I I : I : I M II I M I I M I I I M I I I I I 
Db 124 NNGSCEPTSGACLCGPGFYGQACEDTCPAGFHGSGCQRVCECQQGAPCDPVSGRCLCPAG 183 

Qy 345 FAGERCEARLCPEGLYGIKCDKRCPCHLENTHSCHPMSGECACKPGWSGLYCNETCSPGF 404 

IM M I I I I 

Db 184 FRGQ FCERGCKPGF 197 

Qy 4 05 YGEACQQICSCQNGADCDSVTGKCTCAPGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVC 4 64 

M : I I I M I I I : M I I I I I I I I : I Mill II 
Db 198 FGDGCLQQCNCPTGVPCDPI SGLCLCPPGRAGTTCDLDCRRGRFGPGCALRCDCGGGADC 257 

Qy 4 65 SPVDGSCTCKAGWHGVDC 4 82 

Mill : I I 
Db 258 DPISGQCHCVDSYTGPTC 275 



RESULT 2 

US-09-312-283C-192 

Sequence 192, Application US/09312283C 
Patent No. 6573095 
GENERAL INFORMATION: 
APPLICANT: Watson, James D. 
APPLICANT: Strachan, Lorna 
APPLICANT: Sleeman, Matthew 
APPLICANT: Onrush, Rene 
APPLICANT: Murison, James G. 
APPLICANT: Kumble, Krishanand D. 

TITLE OF INVENTION: Compositions Isolated from Skin Cells 
TITLE OF INVENTION: and Methods for Their Use 
FILE REFERENCE: 11000. 1011c2 

CURRENT APPLICATION NUMBER: US/ 0 9/ 3 12 , 2 8 3C 
CURRENT FILING DATE: 1999-05-14 
NUMBER OF SEQ ID NOS : 425 

SOFTWARE: FastSEQ for Windows Version 4.0 
SEQ ID NO 192 
LENGTH: 2 99 
TYPE: PRT 
ORGANISM: Mouse 
US-09-312-283C-192 



Query Match 21.6%; Score 779/ DB 4; Length 299; 

Best Local Similarity 39.9%; Pred. No. 7e-43; 

Matches 127; Conservative 27; Mismatches 118; Indels 46; Gaps 1; 

Qy 165 GACHCAAGFRGWRCEDRCEQGTYGNDCHQRCQCQNGATCDHVTGECRCPPGYTGAFCEDL 224 

111:11111 I I I I : I I II I I I I I : I I I I I I I I II 
Db 4 GACYCPAGFLGADCSLACPQGRFGPSCAHVCTCGQGAACDPVSGTCICPPGKTGGHCERG 63 

Qy 225 CPPGKHGPQCEQRCPCQNGGVCHHVTGECSCPSGWMGTVCGQPCPEGRFGKNCSQECQCH 284 

I I : I II : I 1 : I I I : I I I MM I I I I I II MM I Ml 
Db 64 CPQDRFGKGCEHKCACRNGGLCHATNGSCSCPLGWMGPHCEHACPAGRYGAACLLECSCQ 123 

Qy 2 85 NGGTCDAATGQCHCSPGYTGERCQDECPVGTYGVLCAETCQCVNGGKCYHVS GACLCEAG 34 4 

I MM M 1 I I I : M M I I I I M I Ml I I I I I I II II 

Db 124 NNGSCEPTSGACLCGPGFYGQACEDTCPAGFHGSGCQRVCECQQGAPCDPVSGRCLCPAG 183 

Qy 345 FAGERCEARLCPEGLYGIKCDKRCPCHLENTHSCHPMSGECACKPGWSGLYCNETCSPGF 404 

I M M MM 

Db 184 FRGQ FCERGCKPGF 197 

Qy 405 YGEACQQICSCQNGADCDSVTGKCTCAPGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVC 464 

M : I I I : I I II : M I I M I I I I M Mill II 
Db 198 FGDGCLQQCNCPTGVPCDPISGLCLCPPGRAGTTCDLDCRRGRFGPGCALRCDCGGGADC 257 

Qy 4 65 SPVDGSCTCKAGWHGVDC 4 82 

Mil! : I I 
Db 258 DPISGQCHCVDSYTGPTC 27 5 



RESULT 3 

US-09-312-283C-332 

Sequence 332, Application US/09312283C 
Patent No. 6573095 
GENERAL INFORMATION: 
APPLICANT: Watson, James D. 
APPLICANT: Strachan, Lorna 
APPLICANT: Sleeman, Matthew 
APPLICANT: Onrust, Rene 
APPLICANT: Murison, James G. 
APPLICANT: Kumble, Krishanand D. 

TITLE OF INVENTION: Compositions Isolated from Skin Cells 
TITLE OF INVENTION: and Methods for Their Use 
FILE REFERENCE: 11000. 1011c2 

CURRENT APPLICATION NUMBER: US/ 0 9/ 3 12 , 2 8 3C 
CURRENT FILING DATE: 1999-05-14 
NUMBER OF SEQ ID NOS : 425 

SOFTWARE: FastSEQ for Windows Version 4.0 
SEQ ID NO 332 
LENGTH: 2 99 
TYPE: PRT 
ORGANISM: Mouse 
US-09-312-283C-332 



Query Match 21.6%; Score 779; DB 4; Length 299; 

Best Local Similarity 39.9%; Pred. No. 7e-43; 

Matches 127; Conservative 27; Mismatches 118; Indels 46; Gaps 1; 



Qy 165 GACHCAAGFRGWRCEDRCEQGTYGNDCHQRCQCQNGATCDHVTGECRCPPGYTGAFCEDL 22 4 

I I I : I I I I I I I I I : I I I I I I I I I : I I I I I I I I 

Db 4 GACYCPAGFLGADCSLACPQGRFGPSCAHVCTCGQGAACDPV5 GTCICPPGKTGGHCERG 63 

Qy 225 CPPGKHGPQCEQRCPCQNGGVCHHVTGECSCPSGWMGTVCGQPCPEGRFGKNCSQECQCH 2 84 

II : I II : I I : I I I : I I I I I I I I I I I I I I I I : I I III 

Db 64 CPQDRFGKGCEHKCACRNGGLCHATNGSCSCPLGWMGPHCEHACPAGRYGAACLLECSCQ 123 

Qy 2 85 NGGTCDAATGQCHCSPGYTGERCQDECPVGTYGVLCAETCQCVNGGKCYHVSGACLCEAG 3 44 

I I : I : : I I I I I : I : I : I I I I : I I i : I I I I I I I I I I I 
Db 12 4 NNGSCEPTSGACLCGPGFYGQACEDTCPAGFHGSGCQRVCECQQGAPCDPVSGRCLCPAG 183 

Qy 345 FAGERCEARLCPEGLYGIKCDKRCPCHLENTHSCHPMSGECACKPGWSGLYCNETCSPGF 404 

II: : I I I I I 

Db 184 FRGQ FCERGCKPGF 197 

Qy 405 YGEACQQICSCQNGADCDSVTGKCTCAPGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVC 4 64 

: I : I I I : I I I I : : i I I I I I I I I : I I : I I I II 

Db 198 FGDGCLQQCNCPTGVPCDPISGLCLCPPGRAGTTCDLDCRRGRFGPGCALRCDCGGGADC 257 

Qy 4 65 SPVDGSCTCKAGWHGVDC 4 82 

I : I I I : I I 
Db 258 DPISGQCHCVDSYTGPTC 275 



RESULT 4 

US-09-188-930-192 

; Sequence 192, Application US/09188930A 
; Patent No. 6150502 
; GENERAL INFORMATION: 

APPLICANT: Watson, James D. 
; APPLICANT: Strachan, Lorna 
; APPLICANT: Sleeman, Matthew 
; APPLICANT: Onrust, Rene 
; APPLICANT: Murison, James Greg 

TITLE OF INVENTION: Compositions Isolated From Skin Cells 
; TITLE OF INVENTION: and Methods For Their Use 

FILE REFERENCE: 11000. lOllcl 
; CURRENT APPLICATION NUMBER: US/ 09/ 18 8 , 930A 
; CURRENT FILING DATE: 1998-11-09 
; NUMBER OF SEQ ID NOS : 348 

; SOFTWARE: FastSEQ for Windows Version 3.0 
; SEQ ID NO 192 
LENGTH: 2 99 
; TYPE: PRT 
; ORGANISM : mouse 
; FEATURE : 
; NAME/ KEY : UNSURE 
; LOCATION: (98) ... (98) 
; NAME/KEY: UNSURE 
; LOCATION: (239) . . . (239) 
US-09-188-930-192 



Query Match 21.1%; Score 759; DB 3; Length 299; 

Best Local Similarity 39.3%; Pred. No. 1.3e-41; 

Matches 125; Conservative 27; Mismatches 120; Indels 46; Gaps 



l; 



Qy 165 GACHCAAGFRGWRCEDRCEQGTYGNDCHQRCQCQNGATCDHVTGECRCPPGYTGAFCEDL 22 4 

I I I : I I I I I I II: I II I I I I I : I I I I i I I I II 

Db 4 GACYCPAGFLGADCSLACPQGRFGPSCAHVCTCGQGAACDPVSGTCICPPGKTGGHCERG 63 

Qy 22 5 CPPGKHGPQCEQRCPCQNGGVCHHVTGECSCPSGWMGTVCGQPCPEGRFGKNCSQECQCH 2 84 

II : I I I : I I : I i I : I I I I I I I I I I I I I I I : I I III 

Db 64 CPQDRFGKGCEHKCACRNGGLCHATNGSCSCPLGXMGPHCEHACPAGRYGAACLLECSCQ 123 

Qy 285 NGGTCDAATGQCHCSPGYTGERCQDECPVGTYGVLCAETCQCVNGGKCYHVSGACLCEAG 344 

I I : I : : I I I I I : I : I : I I I I : I I I : I I I I I I I I i I I 
Db 124 NNGSCEPTSGACLCGPGFYGQACEDTCPAGFHGSGCQRVCECQQGAPCDPVSGRCLCPAG 183 

Qy 345 FAGERCEARLCPEGLYGIKCDKRCPCHLENTHSCHPMSGECACKPGWSGLYCNETCSPGF 404 

II: : I I I I I 

Db 184 FRGQ FCERGCKPGF 197 

Qy 4 05 YGEACQQICSCQNGADCDSVTGKCTCAPGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVC 4 64 

: I : I I I : I I I I : : I I I I I I I I : I hill II 

Db 198 FGDGCLQQCNCPTGVPCDPISGLCLCPPGRAGTTCDLDCRRXRFGPGCALRCDCGGGADC 257 

Qy 465 SPVDGSCTCKAGWHGVDC 482 

I : I I I : I I 
Db 258 DPISGQCHCVDSYTGPTC 275 



RESULT 5 

US-08-185-432-18 

Sequence 18, Application US/08185432 
Patent No. 5750652 
GENERAL INFORMATION: 

APPLICANT : Artavanis— Tsakonas , Spyridon 
APPLICANT: Busseau, Isabelle 
APPLICANT: Diederich, Robert J. 
APPLICANT: Xu, Tian 
APPLICANT: Matsuno, Kenji 

TITLE OF INVENTION: DELTEX PROTEINS, NUCLEIC ACIDS, AND 
TITLE OF INVENTION: ANTIBODIES, 7\ND RELATED METHODS AND COMPOSITIONS 
NUMBER OF SEQUENCES: 2 3 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: PENNIE & EDMONDS 
STREET: 1155 Avenue of the Americas 
CITY: New York 
STATE: New York 
COUNTRY: U.S.A. 
ZIP : 10036-2711 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Patentln Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/ 08/185, 432 
FILING DATE: 21-JAN-1994 
CLASSIFICATION: 530 
ATTORNEY/AGENT INFORMATION: 
NAME: Misrock, S. Leslie 



REGISTRATION NUMBER: 18,872 

REFERENCE/ DOCKET NUMBER: 7326-006 
; TELECOMMUNICATION INFORMATION: 

TELEPHONE: (212) 790-9090 

TELEFAX: (212) 8 69-8864/9741 

TELEX: 66141 PENNIE 
; INFORMATION FOR SEQ ID NO: 18: 
; SEQUENCE CHARACTERISTICS: 

LENGTH: 2523 amino acids 
; TYPE: amino acid 

; TOPOLOGY: unknown 

MOLECULE TYPE: protein 
US-08-185-432-18 

Query Match 20.0%; Score 719; DB 1; Length 2523; 

Best Local Similarity 25.9%; Pred. No. 3.2e-38; 

Matches 225; Conservative 60; Mismatches 221; Indels 364; Gaps 51; 

Qy 5 LNSCLSFICL LLCHWIGTASPLNLED PNVCSHW ESYS 41 

: I : I : : I I I : I : : I 

Db 603 INECLSKPCLNGGQCTDRENGYICTCPKGTTGVNCETKIDDCASNLCDNGKCIDKIDGYE 662 

Qy 42 VTVQESYPHPFDQIYYT SCTDILNWFKCTRHRVSYRTAYRHGEKTMYRR 9 0 

I : I I : I I : I I I 
Db 663 CTCEPGYTGKLCNININECDSNPCRNGGTCKDQINGFTCV 7 02 

Qy 91 KSQCCPGFYESGEMC VPHC-ADKCVHGRC IAPNTCQCEPGWGGTNC SSACD 140 

II I M I I : : I : I I I : I I I I I I : I I : : I : 

Db 703 CPDGYHD-HMCLSEVNECNSNPCIHGACHDGVNGYKCDCEAGWSGSNCDINNNECE 757 

Qy 141 GDHWGPHCTSRCQCKNGALCNPITGA--CHCAAGFRGWRCEDRCEQGTYGNDCHQRCQCQ 198 

: I I I I : I I I I I I I I | | : I I : I I 

Db 758 SN PCMNGGTCKDMTGAYICTCKAGFSGPNCQ TNINECSSN-PCL 8 00 

Qy 199 NGATC-DHVTG-ECRCPPGYTGAFCEDLCPPGKHGPQCEQRC PCQNGGVCHH V 249 

I 11111:11 I I I I II : I I 11:1111 

Db 801 NHGTCIDDVAGYKCNCMLPYTGAICEAVLAP CAGS PCKNGGRCKESEDFE 850 

Qy 250 TGECSCPSGWMGTVCGQPCPEGRFGKNCSQEC QCHNGGTCDAATG — QCHCSPGYTG 304 

I I I I I I I I : II I I I 1 I I : I : I I I I f I 

Db 851 TFSCECPPGWQGQTC EIDMNECVNRPCRNGATCQNTNGSYKCNCKPGYTG 900 

Qy 305 ERCQ DECPVGTYGVLCAETCQCVNGGKCYHVSGA — C L C EAG FAG E R C EARL 354 

I : 1:1 : Mill I I I I I I : I I : 

Db 901 RNCEMDIDDC QPNPCHNGGSCSDGINMFFCNCPAGFRGPKCEEDINECAS 950 

Qy 355 CPEGLYGIKCDKRCPCHLE NTHSCHPMSG ECAC 387 

I I III: I I I : | : I I I 

Db 951 NPCKNGANCTDCWSYTCTCQPGFSGIHCESNTPDCTESSCFNGGTC--IDGINTFTCQC 1008 

Qy 38 8 KPGWSGLYC NE TCSPGFYGEACQQI CS CQ 416 

I I : : I II M 111:111: I I : 

Db 1009 PPGFTGSYCQHDINECDSKPCLNGGTCQDSYGTYKCTCPQGYTGLNCQNLVRWCDSSPCK 1068 

Qy 417 NGADCDSVTG — KCTCAPGFKGIDCSTP 442 

M I : I I I : I : I I 

Db 1069 NGGKCWQTNNFYRCECKSGWTGVYCDVPSVSCEVAAKQQGVDIVHLCRNSGMCVDTGNTH 1128 



Qy 443 CPLGTYGINCSSR CG CKNDAVCSPVDG — SCTCKAGWHGVDCS 483 

I I I I : I hill: I I I I I I : I I I : I I 

Db 112 9 FCRCQAGYTGSYCEEQVDECSPNPCQNGATCTDYLGGYSCECVAGYHGVNCSEEINECLS 1188 

QY 484 IRCPSGTWGFGCNLT C QCLNGGACNTLDG 512 

I I I I I I : I : I I I I I 

Db 1189 HPCQNGGTCIDLINTYKCSCPRGTQGVHCEINVDDCTPFYDSFTLEPKCFNNGKCIDRVG 1248 

Qy 513 --TCTCAPGWRGEKCE LPCQD-GTYGLNCAE RCDC SH 54 6 

I I I I : I I : I I II II II : I I : I I 

Db 1249 GYNCICPPGFVGERCEGDVNECLSNPCDSRGTQ — NCIQLVNDYRCECRQGFTGRRCESV 1306 

Qy 547 ADGC HPTTGH-CRCLPGWSGVHCD 569 

I II : I I : I I I : I I : 

Db 1307 VDGCKGMPCRNGGTCAVASNTERGFICKCPPGFDGATCEYDSRTCSNLRCQNGGTCISVL 1366 

Qy 570 SVCAEGRWGPNC SLPCY 586 

I I : I I I I i I I I 

Db 1367 TSSKCVCSEGYTGATCQYPVISPCASHPCY 1396 



RESULT 6 

US-08-899-232-3 

; Sequence 3, Application US/08899232 
; Patent No. 6436650 
; GENERAL INFORMATION: 

APPLICANT: Artavanis-Tsakonas , Spyridon 
; APPLICANT: Qi , Huilin 

; TITLE OF INVENTION: ACTIVATED FORMS OF NOTCH AND METHODS BASED THEREON 
; FILE REFERENCE: 7326-046 

CURRENT APPLICATION NUMBER: US/ 0 8 / 8 9 9 , 2 32 
; CURRENT FILING DATE: 1997-07-23 
; NUMBER OF SEQ ID NOS : 4 
; SOFTWARE: Patentln Ver. 2.0 
; SEQ ID NO 3 

LENGTH: 2523 
; TYPE: PRT 

ORGANISM: Xenopus sp . 
US-08-899-232-3 



Query Match 20.0%; Score 719; DB 4; Length 2523; 

Best Local Similarity 25.9%; Pred. No. 3.2e-38; 

Matches 225; Conservative 60; Mismatches 221; Indels 364; Gaps 51; 

Qy 5 LNSCLSFICL LLCHWIGTASPLNLED PNVCSHW ESYS 41 

: I : I : : I I I : I : : I 

Db 603 INECLSKPCLNGGQCTDRENGYICTCPKGTTGVNCETKIDDCASNLCDNGKCIDKIDGYE 662 

Qy 42 VTVQESYPHPFDQIYYT SCTDILNWFKCTRHRVSYRTAYRHGEKTMYRR 90 

I : I I : | | : | | | 
Db 663 CTCEPGYTGKLCNININECDSNPCRNGGTCKDQINGFTCV 7 02 



Qy 91 KSQCCPGFYESGEMC VPHC-ADKCVHGRC IAPNTCQCEPGWGGTNC SSACD 14 0 

II I M I I : : I : | I | : I I I I I I : I I : : I : 

Db 7 03 CPDGYHD-HMCLSEVNECNSNPCIHGACHDGVNGYKCDCEAGWSGSNCDINNNECE 757 



Qy 141 GDHWGPHCTSRCQCKNGALCNPITGA — CHCAAGFRGWRCEDRCEQGTYGNDCHQRCQCQ 198 

: I I I I : 1 I I I I I I I I I : I I : I I 

Db 758 SN PCMNGGTCKDMTGAYICTCKAGFSGPNCQ TNINECSSN-PCL 800 

Qy 199 NGATC-DHVTG-ECRCPPGYTGAFCEDLCPPGKHGPQCEQRC PCQNGGVCHH V 249 

I 11111:11 I I I I I I : ! 1 I I : I I I I 

Db 801 NHGTCIDDVAGYKCMCMLPYTGAICEAVLAP CAGSPCKNGGRCKESEDFE 850 

Qy 250 TGECSCPSGWMGTVCGQPCPEGRFGKNCSQEC QCHNGGTCDAATG — QCHCSPGYTG 304 

I I I I I I I I : II I I I I I I : I : I I I II I 

Db 851 TFSCECPPGWQGQTC EIDMNECVNRPCRNGATCQNTNGS YKCNCKPGYTG 900 

Qy 305 ERCQ DECPVGTYGVLCAETCQCVNGGKCYHVSGA— CLCEAGFAGERCEARL 354 

I : I : I : I I I I I I I I I I I : I I : 

Db 901 RNCEMDIDDC QPNPCHNGGSCSDGINMFFCNCPAGFRGPKCEEDINECAS 950 

Qy 355 CPEGLYGIKCDKRCPCHLE NTHSCHPMSG ECAC 387 

I I I I I : I I I : I : I II 

Db 951 NPCKNGANCTDCVNS YTCTCQPGFSGIHCESNTPDCTESSCFNGGTC — IDGINTFTCQC 1008 

Qy 388 KPGWSGLYC NE TCSPGFYGEACQQI CS CQ 416 

I I : : I I I II I I I : I I I : I I : 

Db 1009 PPGFTGSYCQHDINECDSKPCLNGGTCQDSYGTYKCTCPQGYTGLNCQNLVRWCDSSPCK 1068 

Qy 417 NGADCDSVTG — KCTCAPGFKGIDCSTP 4 42 

III : I I I : I : I I 

Db 1069 NGGKCWQTNNFYRCECKSGWTGVYCDVPSVSCEVAAKQQGVDIVHLCRNSGMCVDTGNTH 1128 

Qy 443 CPLGTYGINCSSR CG CKNDAVCS PVDG — SCTCKAGWHGVDCS 483 

I I I I : I I : I I I : I I I I I I : I II : II 

Db 1129 FCRCQAGYTGSYCEEQVDECSPNPCQNGATCTDYLGGYSCECVAGYHGVNCSEEINECLS 1188 

Qy 4 84 1 RCP S GTWGFGCNLT C QCLNGGACNTLDG 512 

I I I I I I : I : I I I I I 

Db 1189 HPCQNGGTCIDLINTYKCSCPRGTQGVHCEINVDDCTPFYDSFTLEPKCFNNGKCIDRVG 1248 

Qy 513 --TCTCAPGWRGEKCE LPCQD-GTYGLNCAE RCDC SH 546 

I I I I : I I : I I II II II: I I : I I 

Db 1249 GYNCICPPGFVGERCEGDVNECLSNPCDSRGTQ--NCIQLVNDYRCECRQGFTGRRCESV 1306 

Qy 547 ADGC HPTTGH-CRCLPGWSGVHCD 569 

III : I I : I I I : I I : 

Db 1307 VDGCKGMPCRNGGTCAVASNTERGFICKCPPGFDGATCEYDSRTCSNLRCQNGGTCISVL 1366 

Qy 570 SVCAEGRWGPNC SLPCY 586 

I I : I I I I I I I I 

Db 1367 TSSKCVCSEGYTGATCQYPVISPCASHPCY 1396 



RESULT 7 
US-09-230-652-2 

; Sequence 2, Application US/09230652A 
; Patent No. 6537775 
; GENERAL INFORMATION: 

; APPLICANT: Tournier-Lasserve, Elisabeth 

APPLICANT: Joutel, Anne 
; APPLICANT: Bousser, Marie-Germaine 



APPLICANT: Bach, Jean-Francois 
; TITLE OF INVENTION: GENE INVOLVED IN CADASIL, METHOD OF DIAGNOSIS AND 
; TITLE OF INVENTION: THERAPEUTIC APPLICATION 

FILE REFERENCE: 03715.004 8-00 00 0 
; CURRENT APPLICATION NUMBER: US/ 09/230, 652A 
; CURRENT FILING DATE: 1999-05-17 
; EARLIER APPLICATION NUMBER: FR 96 09733 
; EARLIER FILING DATE: 1996-08-01 
; EARLIER APPLICATION NUMBER: FR 97 04 68 0 
; EARLIER FILING DATE: 1997-04-16 

EARLIER APPLICATION NUMBER: PCT/FR97/ 01433 
; EARLIER FILING DATE: 1997-07-31 
; NUMBER OF SEQ ID NOS : 163 
; SOFTWARE: Patentln Ver. 2.1 
; SEQ ID NO 2 
; LENGTH: 2321 
TYPE: PRT 

ORGANISM: Homo sapiens 
FEATURE : 

OTHER INFORMATION: human ADNc No. 6537775ch 3 
US-09-230-652-2 

Query Match 19.4%; Score 697; DB 4; Length 2321; 

Best Local Similarity 25.2°s; Pred. No. 7.6e-37; 

Matches 226; Conservative 61; Mismatches 249; Indels 360; Gaps 51 
Qy 5 LNSCLS FICLLLCHWIGTASPLNLED PNVC-SHWESY 4 0 



Db 432 VNECLSGPCRNQATCLDRIGQFTCICMAGFTGTYCEVDIDECQSSPCVNGGVCKDRWGF 491 

Qy 41 SVTVQESYPHPFDQIYYTSC--TDILNWFKCTRHRVSYRTAYRHGEKTMYRRKSQCCPGF 98 

II : I : I I I I I I : : I I I 

Db 4 92 SCTCPSGFSGSTCQLDVDECASTPCRNGAKCVDQPDGY ECRCAEGF 537 

Qy 99 YESGEMC VPHCA-DKCVHGRC IAPNTCQCEPGWGGTNCSSACDGDHWGPHCTSR 151 

I : I I I : I I I I II II : I I I I : I I I I I I I : 

Db 538 --EGTLCDRNVDDCSPDPCHHGRCVDGIAS FSCACAPGYTGTRCESQVD ECRSQ 589 

Qy 152 CQCKNGALCNPITG — ACHCAAGFRGWRCE DRCEQG — TYG — NDCHQR — CQCQNG 2 00 

I : : I I : I I : I I I I II hi I I I I I I 

Db 590 -PCRHGGKCLDLVDKYLCRCPSGTTGVNCEVNIDDCASNPCTFGVCRDGINRYDCVCQPG 64 8 

Qy 201 AT CDHVTGECRCPPGYTGAFCED LCPPGKHGPQC EQRC PCQNGG 244 

I I : II I III I I I I I II I I I : I 

Db 649 FTGPLCNVEINECAS S PCGEGGSCVDGENGFRCLCPPGSLPPLCLPPSHPCAHEPCSH-G 707 

Qy 245 VCHHVTG — ECSCPSGWMGTVCGQ PCPEGRFGK 275 

: I : I I I I I I I I Mil: 
Db 708 ICYDAPGGFRCVCEPGWSGPRCSQSLARDACESQPCRAGGTCSSDGMGFHCTCPPGVQGR 767 

Qy 276 NCS — QEC QCHNGGTCDAATGQ CHCSPGYTGERCQ DEC PVGTYGVLC 320 

I I I : I I I : : I I I II I : I I I I I I I I I : I : I 

Db 768 QCELLSPCTPNPCEHGGRCESAPGQLPVCSCPQGWQGPRCQQDVDECAGPAPCGPHGI-C 826 

Qy 321 AE TCQ CVNGGKCYHVSG — ACLCEAGFAGERCEA 352 

II hill I I : I I I I I I II I 

Db 827 TNLAGSFSCTCHGGYTGPSCDQDINDCDPNPCLNGGSCQDGVGSFSCSCLPGFAGPRC-A 8 85 



Qy 353 R LCPEGLYGIKCDKRCPCHLENTHSCHPM 381 

I II I I I :: I II 
Db 88 6 RDVDECLSNPCGPGTCTDHVASFTCTCPPGYGGFHCEQDLP DCSPSSCFNGGT 938 

Qy 382 SGECACKPGWSGLYCNE TCSPGFYGEACQ 410 

I I I : I I : : I : I II I I I I 

Db 939 CVDGVNSFSCLCRPGYTGAHCQHEADPCLSRPCLHGGVCSAAHPGFRCTCLESFTGPQCQ 998 

Qy 411 QI CS CQNGADCDSVTGKCTCAPGFKGIDC STP 442 

: II Mill I I I I : I I I I 

Db 999 TLVDWCSRQPCQNGGRCVQTGAYCLCPPGWSGRLCDIRSLPCREAAAQIGVRLEQLCQAG 1058 

Qy 443 CPLGTYGINCSSRCG CKNDAVCSPVDGS--CTCKAGWHGVD 4 81 

I I I I : I I : : I I I I | : : I : 

Db 1059 GQCVDEDSSHYCVCPEGRTGSHCEQEVDPCLAQPCQHGGTCRGYMGGYMCECLPGYNGDN 1118 

Qy 482 CS IRCPSGTWGFGCNLT C QCLN 503 

I 111111:1 : I I : 

Db 1119 CEDDVDECASQPCQHGGSCIDLVARYLCSCPPGTLGVLCEINEDDCGPGPPLDSGPRCLH 117 8 

Qy 504 GGACNTLDG — TCTCAPGWRGEKCEL PCQDGTYGLNCAERCDCSHADGCHPTTG 555 

I I I I I I I I I : I : I I I : I I : I I I 

Db 117 9 NGTCVDLVGGFRCTCPPGYTGLRCEADINECRSGA CHAAHTRDCLQDPGGGF 12 3 0 

Qy 55 6 HCRCLPGWSGVHCDSV CAEGRWGPNC 581 

I I I : I I I : I 11:1111 
Db 12 31 RCLCHAGFSGPRCQTVLSPCESQPCQHGGQCRPSPGPGGGLTFTCHCAQPFWGPRC 12 8 6 



RESULT 8 

US-08-400-159-10 

Sequence 10, Application US/08400159 
Patent No. 5869282 
GENERAL INFORMATION: 

APPLICANT: I sh-Horowicz , David 
APPLICANT: Henrique, Domingos M.P. 
APPLICANT: Lewis, Julian H. 
APPLICANT: Myat, Anna M. 
APPLICANT: Fleming, Robert J. 
APPLICANT : Artavani s-Tsakonas , Spyridon 
APPLICANT: Mann, Robert S. 
APPLICANT: Gray, Grace E. 

TITLE OF INVENTION: NUCLEOTIDE AND PROTEIN SEQUENCES OF THE 
TITLE OF INVENTION: SERRATE GENE AND METHODS BASED THEREON 
NUMBER OF SEQUENCES: 2 0 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: Pennie & Edmonds 
STREET: 1155 Avenue of the Americas 
CITY: New York 
STATE: New York 
COUNTRY: USA 
ZIP: 10036-2711 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 



SOFTWARE: Patentln Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/400 , 159 
FILING DATE: 07-MAR-1995 
CLASSIFICATION: 435 
ATTORNEY/AGENT INFORMATION: 
NAME: Misrock, S. Leslie 
REGISTRATION NUMBER: 18,872 
REFERENCE/DOCKET NUMBER: 7326-029 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: (212) 790-9090 
TELEFAX: (212) 8 69-9741/8 8 64 
TELEX: 66141 PENNIE 
INFORMATION FOR SEQ ID NO: 10: 
SEQUENCE CHARACTERISTICS: 
LENGTH : 1193 amino acids 
TYPE: amino acid 
TOPOLOGY: linear 
MOLECULE TYPE: protein 
US-08-400-159-10 

Query Match 18.8%; Score 678; DB 2; Length 1193; 

Best Local Similarity 27.4%; Pred. No. 6.8e-36; 

Matches 197; Conservative 63; Mismatches 214; Indels 244; Gaps 46 

Qy 50 HPFDQI YYTSCTDILNWFKCTRHRVSYRTAYRHGEKTM YRRKSQCCPGFYESG 102 

III : I : : I : : : I I I : I : I I 
Db 184 HTCDQNGNKTCLEGWTGPECNKAICRQGCSPKHGSCTVPGECRCQYGWQGQYC 236 

Qy 103 EMCVPHCADKCVHGRCIAPNTCQCEPGWGGTNCSSACDGDHWGPHCTSRCQCKNGALCN- 161 

: I : I I I I I I I I I III III III : I : III I : 

Db 237 DKCIPH--PGCVHGTCIEPWQCLCETNWGG QLCDKDL--NYCGTHPPCLNGGTCSN 288 

Qy 162 — PITGACHCAAGFRGWRCEDRCEQGTYGNDCHQRCQCQNGATC-DHVTG-ECRCPPGYT 217 

I I I I : I I I I : ! I | | : | : | | I | ! M : 

Db 289 TGPDKYQCSCPEGYSGQNCE-IAEHACLSDPCH NGGSCLETSTGFECVCAPGWA 341 

Qy 218 GAFCEDL CPPGKHGPQCEQRCPCQNGGVCHHVTG — ECSCPSGWMGTVC 264 

III II | | : M I : : i ! I I I I 

Db 342 GPTCTDNIDDCSPN PCGHGGTCQDLVDGFKCICPPQWTGKTCQLDANECE 391 

Qy 265 GQPCPE GRFGKNCS QEC — QCHNGGTC-DAATG-QCHCSP 300 

I : I I I I I I : I I I I II : I I I : I III 

Db 392 GKPCVNANSCRNLIGSYYCDCITGWSGHNCDININDCRGQCQNGGSCRDLVNGYRCICSP 451 

Qy 301 GYTGERCQ DECPVGTYGVLCAETCQCVNGGKCY-HVSG-ACLCEAGFAGERCEARL- 35 4 

llhl: : II s I : I I I I : : I Mill: I : : 

Db 452 GYAGDHCEKDINEC ASNPCMNGGHCQDEINGFQCLCPAGFSGNLCQLDID 501 

Qy 355 CPEGLYGIKCD KRCPCHLENT 375 

III II : ||::: 

Db 502 YCEPNPCQNGAQCFNLAMDYFCNCPEDYEGKNCSHLKDHCRTTPCEVIDSCTVAVASNST 561 

Qy 376 HSCHPMSG ECACKPGWSGLYCNETCS PGFYGEACQQI CSCQN 417 

I : I I I I : : I I I : I : I : I : I 

Db 562 PEGVRYI S SNVCGPHGKCKSQAGGKFTCECNKGFTGTYCHENIND CES-NPCKN 614 



Qy 418 GADC-DSVTG-KCTCAPGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVCSPV— DGSCTC 473 

I I I I i I I : I :: I I I I I I : I I I : I I I 

Db 615 GGTCIDGVNSYKCICSDGWEGTYCET NINDCSKNPCHNGGTCRDLVNDFFCEC 667 

Qy 474 KAGWHGVDCSIR CPSGTWGFGCNL TC— -QC 501 

i M I I I 11:1111: : I I 

Db 668 KNGWKGKTCHSRDSQCDEATCNNGGTCYDEGDTFKCMCPAGWEGATCNIARNSSCLPNPC 727 

QY 502 LNGGACNTLDG TCTCAPGWRGEKCEL PC-QDGTYGLNCAE RCDC 544 

I I I I : I I I I II I I : | | : | 

Db 72 8 HNGGTC-WSGDSFTCVCKEGWEGPTCTQNTNDCSPHPCYNSGT CVDGDNWYRCEC 7 82 

QY 54 5 S HADGCHPTTGHCR CLPGWSGVHCDSVCAEGRWGPNC SLPCY 58 6 

: I I : I :: I I : I I I I I I I I : 

Db 783 APGFAGPDCRININECQSSPCAFGATCVDEINGYRC--ICPPGRSGPGCQEVTGRPCF 838 

RESULT 9 

US-08-611-729A-10 

Sequence 10, Application US/08611729A 
Patent No. 6004924 
GENERAL INFORMATION: 

APPLICANT: Ish-Horowicz , David 
APPLICANT: Henrique, Domingos M.P. 
APPLICANT: Lewis, Julian H. 
APPLICANT: Myat, Anna M. 
APPLICANT: Fleming, Robert J. 
APPLICANT: Artavanis-Tsakonas , Spyridon 
APPLICANT: Mann, Robert S. 
APPLICANT: Gray, Grace E. 

TITLE OF INVENTION: NUCLEOTIDE AND PROTEIN SEQUENCES OF THE 
TITLE OF INVENTION: SERRATE GENE AND METHODS BASED THEREON 
NUMBER OF SEQUENCES: 2 0 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: Pennie & Edmonds 
STREET: 1155 Avenue of the Americas 
CITY: New York 
STATE: New York 
COUNTRY: U.S.A. 
ZIP: 10036-2711 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Patentln Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/611 , 729A 
FILING DATE: 06-MAR-19 96 
CLASSIFICATION: 435 
ATTORNEY/AGENT INFORMATION: 
NAME: Misrock, S. Leslie 
REGISTRATION NUMBER: 18,872 
REFERENCE/DOCKET NUMBER: 7326-037 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: (212) 790-9090 
TELEFAX : (212) 8 69-9741/8 8 64 
TELEX: 66141 PENNIE 



INFORMATION FOR SEQ ID NO: 10: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 1193 amino acids 
; TYPE: amino acid 

; TOPOLOGY: unknown 

MOLECULE TYPE: protein 
US-08-611-729A-10 

Query Match 18.8%; Score 678; DB 3; Length 1193; 

Best Local Similarity 27.4%; Pred. No. 6.8e-36; 

Matches 197; Conservative 63; Mismatches 214; Indels 244; Gaps 46 

QY 50 HPFDQIYYTSCTDILNWFKCTRHRVSYRTAYRHGEKTM YRRKSQCCPGFYESG 102 

III : I : : I : : : I I I : I : I I 
Db 184 HTCDQNGNKTCLEGWTGPECNKAICRQGCSPKHGSCTVPGECRCQYGWQGQYC 236 

QY 103 EMCVPHCADKCVHGRCIAPNTCQCEPGWGGTNCSSACDGDHWGPHCTSRCQCKNGALCN- 161 

= I • II I I I I I I I III II I III : I : | | | | : 

Db 237 DKCIPH — PGCVHGTCIEPWQCLCETNWGG QLCDKDL--NYCGTHPPCLNGGTCSN 288 

Qy 162 --PITGACHCAAGFRGWRCEDRCEQGTYGNDCHQRCQCQNGATC-DHVTG-ECRCPPGYT 217 

I 111:111 I : I I II : I : II I I I I I : 

Db 2 89 TGPDKYQCSCPEGYSGQNCE-IAEHACLSDPCH NGGSCLETSTGFECVCAPGWA 341 

Qy 218 GAFCEDL CPPGKHGPQCEQRCPCQNGGVCHHVTG--ECSCPSGWMGTVC 264 

III M | | : I I I : : I I I I I I 

Db 342 GPTCTDNIDDCSPN PCGHGGTCQDLVDGFKCICPPQWTGKTCQLDANECE 391 

Qy 265 GQPCPE GRFGKNCS QEC--QCHNGGTC-DAATG-QCHCS P 300 

I : I I I I I I : I I I I I I : I I I : I I I I 

Db 392 GKPCVNANSCRNLIGSYYCDCITGWSGHNCDININDCRGQCQNGGSCRDLVNGYRCICSP 451 

Qy 301 GYTGERCQ DECPVGTYGVLCAETCQCVNGGKCY-HVSG-ACLCEAGFAGERCEARL- 354 

I I I : I : : I I : I : I I I I : : I I II I I I : I I : : 

Db 4 52 GYAGDHCEKDINEC ASNPCMNGGHCQDEINGFQCLCPAGFSGNLCQLDID 501 

Qy 355 CPEGLYGIKCD KRCPCHLENT 375 

III II : | | : : : 

Db 502 YCEPNPCQNGAQCFNLAMDYFCNCPEDYEGKNCSHLKDHCRTTPCEVIDSCTVAVASNST 561 

Qy 37 6 HSCHPMSG ECACKPGWSGL YCNETCS PGFYGEACQQI CS CQN 417 

I : I II I : : I I I : I : I : I : I 

Db 562 PEGVRYISSNVCGPHGKCKSQAGGKFTCECNKGFTGTYCHENIND CES-NPCKN 614 

Qy 418 GADC-DSVTG-KCTCAPGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVCSPV — DGSCTC 473 

I III II I : I : : I I I It I : II I : I I I 

Db 615 GGTCIDGVNSYKCICSDGWEGTYCET NINDCSKNPCHNGGTCRDLVNDFFCEC 667 

Qy 474 KAGWHGVDCSIR CPSGTWGFGCNL TC QC 501 

I I I : I I I I : : I I 

Db 668 KNGWKGKTCHSRDSQCDEATCNNGGTCYDEGDTFKCMCPAGWEGATCNIARNSSCLPNPC 727 

Qy 502 LNGGACNTLDG TCTCAPGWRGEKCEL PC-QDGTYGLNCAE RCDC 544 

I I I I = I I I I III I II || | : | | : | 

Db 728 HNGGTC-WSGDSFTCVCKEGWEGPTCTQNTNDCSPHPCYNSGT CVDGDNWYRCEC 782 

Qy 545 S HADGCHPTTGHCR CLPGWSGVHCDSVCAEGRWGPNC SLPCY 586 



Db 



: i I : | : : | | 

7 83 APGFAGPDCRININECQSSPCAFGATCVDEINGYRC- 



: I I I I I I ||: 
-ICPPGRSGPGCQEVTGRPCF 836 



RESULT 10 
US-08-185-432-19 

Sequence 19, Application US/08185432 
Patent No. 5750652 
GENERAL INFORMATION: 

APPLICANT : Artavanis-Tsakonas , Spyridon 
APPLICANT: Busseau, Isabelle 
APPLICANT: Diederich, Robert J. 
APPLICANT: Xu, Tian 
APPLICANT: Matsuno, Kenji 

TITLE OF INVENTION: DELTEX PROTEINS, NUCLEIC ACIDS, AND 
TITLE OF INVENTION: ANTIBODIES, AND RELATED METPIODS AND COMPOSITIONS 
NUMBER OF SEQUENCES: 23 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: PENNIE & EDMONDS 
STREET: 1155 Avenue of the Americas 
CITY: New York 
STATE: New York 
COUNTRY: U.S.A. 
ZIP: 10036-2711 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Patentln Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/185, 432 
FILING DATE: 21-JAN-1994 
CLASSIFICATION: 530 
ATTORNEY/AGENT INFORMATION: 
NAME: Misrock, S. Leslie 
REGISTRATION NUMBER: 18,872 
REFERENCE/DOCKET NUMBER: 7326-006 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: (212) 790-9090 
TELEFAX : (212) 8 69-8864/9741 
TELEX: 66141 PENNIE 
INFORMATION FOR SEQ ID NO: 19: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 2703 amino acids 
TYPE: amino acid 
TOPOLOGY: unknown 
MOLECULE TYPE: protein 
US-08-185-432-19 



Query Match 18.8%; Score 676; DB 1; Length 2703; 

Best Local Similarity 25.4%; Pred. No. 1.9e-35; 

Matches 208; Conservative 79; Mismatches 202; Indels 330; Gaps 51; 

Qy 7 SCL SFICLLLCHWIGTASPLNLED--PNVCSHWESYS VTVQESYPHPFDQI YYTSC 60 

III : | 1 : : : | | : : : : : | | : : | 

Db 502 SCLDDPGTFRCVCMPGFTGTQCEIDIDECQSNPC LNDGTC 541 



Qy 61 T DI LNWFKCTRHRVS YRTAYRHGEKTMYRRKSQCC PGFYES GEMC VPHCADKCVHGR 117 

I : I I I I : I I I : I I : I : i 

Db 542 HDK1NGFKCS CALGF--TGARCQINIDDCQSQPCRNR 576 

Qy 118 CIAPNTCQCEPGWGGTNCS SACDGDHWGPHCTSRCQCKNGALCNPITG-ACH 163 

I I : I : I i I : I I : I : I I : II : : I 

Db 577 GICHDSIAGYSCECPPGYTGTSCEININDCDSN PCHRGKCIDDVNSFKCL 62 6 

Qy 169 CAAGFRGW RCEDR CEQGTYG NDCHQRCQ 196 

I I : I : I : I I hill h I I 

Db 627 CDPGYTGYICQKQINECESNPCQFDGHCQDRVGSYYCQCQAGTSGKNCEVNVNECHSN-P 685 

Qy 197 CQNGATC-DHVTG-ECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPCQNGGVC-HHVTG-E 252 

I I I I I I I : : | : I I I : | | I | | : : | I I I I I I I I : 

Db 686 CNNGATCIDGINSYKCQCVPGFTGQHCE KNVDECI S-S PCANNGVCIDQVNGYK 738 

Qy 253 CSCPSGWMGTVC GQP CPEGRFGKNCS QECQ-- 2 82 

Mil: I I I I I I I I I I 

Db 739 CECPRGFYDAHCLSDVDECASNPCVNEGRCEDGINEFICHCPPGYTGKRCELDIDECSSN 7 98 

Qy 283 -CHNGGTC-DAATG-QCHCSPGYTGERCQ DECPVGTYGVLCAETCQCVNGGKCY-HV 335 

I : I I I I I I I I I I I I : : I : hi I I I I I I I 

Db 7 99 PCQHGGTCYDKLNAFSCQCMPGYTGQKCETNIDDC VTNPCGNGGTCIDKV 84 8 

Qy 33 6 SG-ACLCEAGFAGERCEARLCPEGLYGIKCDK-RCPCHLENTHSCHPMSG ECACKP 3 89 

= I 1 = 1: II | | : : : | I : I I : I III III 

Db 849 NGYKCVCKVPFTGRDCESKMDP CARNRC KNEAKCTPSSNFLDFSCTCKL 897 

Qy 390 GWSGLYCNE TCSPGFYGEAC QQICS CQN 417 

h : I I h I | : | : | | | : || I 

Db 898 GYTGRYCDEDI DECS LSSPCRN GAS CLNVPGSYRCLCTKGYEGRDCAINTDDCASFPCQN 957 

Qy 418 GADCDSVTG--KCTCAPGFKGIDCST PCPLGTYGI 4 50 

II I I I I I I I I I II I I I 

Db 958 GRTCLDGIGDYSCLCVDGFDGKHCETDINECLSQPCQNGATCSQYVNSYTCTCPLGFSGI 1017 

Qy 451 NCS SRCGCKNDAVCSPVDG SCTCKAGWHGVDCSIR 485 

II = II I : I I : |: I I h I : I : 

Db 1018 NCQTNDEDCTESSCLNGGSC — IDGINGYNCSCLAGYSGANCQYKLNKCDSNPCLNGATC 1075 

Qy 48 6 CPSGTWGFGCNL TCQCLNGGACNTL — DGTCTCAP GWRGEKCE- 52 6 

I I I I I h III I : : : I h I I h I : 

Db 107 6 HEQNNEYTCHCPSGFTGKQCSEYVDWCGQSPCENGATCSQMKHQFSCKCSAGWTGKLCDV 1135 

Qy 527 --LPCQDGT--YGLNCAERCD CSHADGCHPTTGHCRCLPGWSGVHC 568 

: I I I I I : : |: | | I I h : I : I 

Db 1136 QTISCQDAADRKGLSLRQLCNNGTCKDYGNSHV CYCSQGYAGS YCQKEIDECQSQP 1191 

Qy 569 DSVCAEGRWGPNCSL PC 585 

: I : I I I I I II 
Db 1192 CQNGGTCRDLIGAYECQCRQGFQGQNCELNIDDCAPNPC 1230 

RESULT 11 
US-08-899-232-4 

; Sequence 4, Application US/08899232 
; Patent No. 6436650 



; GENERAL INFORMATION: 

; APPLICANT: Artavanis-Tsakonas , Spyridon 
; APPLICANT: Qi, Huilin 

; TITLE OF INVENTION: ACTIVATED FORMS OF NOTCH AND METHODS BASED THEREON 
; FILE REFERENCE: 7326-046 

; CURRENT APPLICATION NUMBER: US/ 0 8 / 8 9 9 , 2 32 

CURRENT FILING DATE: 1997-07-23 
; NUMBER OF SEQ ID NOS : 4 
; SOFTWARE: Patentln Ver. 2.0 
; SEQ ID NO 4 
; LENGTH: 27 03 
TYPE: PRT 

ORGANISM: Drosophila sp . 
US-08-899-232-4 



Query Match 18.8%; Score 676; DB 4; Length 2703; 

Best Local Similarity 25.4%; Preci. No. 1.9e-35; 

Matches 208; Conservative 79; Mismatches 202; Inclels 330; Gaps 51 

Q y 7 SCL SFICLLLCHWIGTASPLNLED — PNVCSHWESYS VTVQESYPHPFDQI YYTSC 60 

III : I I : : : | | : : : s : | | . . , 

502 SCLDDPGTFRCVCMPGFTGTQCEIDIDECQSNPC • LNDGTC 541 



Db 



Q y 61 TDILNWFKCTRHRVSYRTAYRHGEKTMYRRKSQCCPGFYESGEMC VPHCADKCVHGR 117 

I :l IN: | || :| | ; | : , 

Db 542 HDKINGFKCS CALGF — TGARCQINIDDCQSQPCRNR 576 

Q y 118 CIAPNTCQCEPGWGGTNCS — SACDGDHWGPHCTSRCQCKNGALCNPITG-ACH 168 

I I : I : I I I : I I : I : | | : | | : - | 

Db 577 GICHDSIAGYSCECPPGYTGTSCEININDCDSN PCHRGKCIDDVNS FKCL 626 

Qy 169 CAAGFRGW RCEDR CEQGTYG NDCHQRCQ 196 

' I : I : I : I I hill |:|| 

Db 627 CDPGYTGYICQKQINECESNPCQFDGHCQDRVGSYYCQCQAGTSGKNCEVNVNECHSN-P 685 

197 CQNGATC-DHVTG-ECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPCQNGGVC-HHVTG-E 252 

' I I I I I I : :hl I I: I I II |: :| | : 

Db 686 CNNGATCIDGINSYKCQCVPGFTGQHCE KNVDECI S-SPCANNGVCIDQVNGYK 738 

Qy 253 CSCPSGWMGTVC GQP CPEGRFGKNCS QECQ — 282 

I I I I : I | I I I I I I || 

Db 739 CECPRGFYDAHCLSDVDECASNPCVNEGRCEDGINEFICHCPPGYTGKRCELDIDECSSN 798 

QY 283 -CHNGGTC-DAATG-QCHCSPGYTGERCQ DECPVGTYGVLCAETCQCVNGGKCY-HV 335 

IMMII I I II I I I : : I : |:| I I III I I 

Db 799 PCQHGGTCYDKLNAFSCQCMPGYTGQKCETNIDDC VTNPCGNGGTCIDKV 848 

^ 336 SG-ACLCEAGFAGERCEARLCPEGLYGIKCDK-RCPCHLENTHSCHPMSG ECACKP 389 

:| l:h II II::: I | : || :| | | | , || 

Db 849 NGYKCVCKVPFTGRDCESKMDP CARNRC KNEAKCTPSSNFLDFSCTCKL 897 

Qy 390 GWSGLYCNE TCSPGFYGEAC QQICS CQN 417 

I : : 1 1 1 : 1 I : I : I I | : III 

Db 898 GYTGRYCDEDI DECS LSSPCRN GAS CLNVPGSYRCLCTKGYEGRDCAINTDDCASFPCQN 957 

QY 418 GADCDSVTG— KCTCAPGFKGIDCST PCPLGTYGI 450 

II I I I II I I I | | M II 



Db 



95 8 GRTCLDGIGDYSCLCVDGFDGKHCETDINECLSQPCQNGATCSQYWSYTCTCPLGFSGI 1017 



Qy 451 NCS SRCGCKNDAVCSPVDG SCTCKAGWHGVDCSIR 4 85 

II : II I : I I : I : I I I : I : I : 

Db 1018 NCQTNDEDCTESSCLNGGSC — IDGINGYNCSCLAGYSGANCQYKLNKCDSNPCLNGATC 1075 

Qy 486 CPSGTWGFGCNL TCQCLNGGACNTL--DGTCTCAPGWRGEKCE- 52 6 

I I I I I I : I I I I : : : I I : I I | : | : 

Db 1076 HEQNNEYTCHCPSGFTGKQCSEYVDWCGQSPCENGATCSQMKHQFSCKCSAGWTGKLCDV 1135 

Qy 52 7 --LPCQDGT--YGLNCAERCD CSHADGCHPTTGHCRCLPGWSGVHC 5 68 

: I I I I I : : I : I I I I I : : I : I 

Db 1136 QTISCQDAADRKGLSLRQLCNNGTCKDYGNSHV CYCSQGYAGS YCQKEIDECQSQP 1191 

Qy 569 DSVCAEGRWGPNCSL PC 585 

: I : I I I I I II 
Db 1192 CQNGGTCRDLIGAYECQCRQGFQGQNCELNIDDCAPNPC 1230 



RESULT 12 
US-08-185-432-16 

Sequence 16, Application US/08185432 
Patent No. 5750652 
GENERAL INFORMATION: 

APPLICANT : Artavanis-Tsakonas, Spyridon 
APPLICANT: Busseau, Isabelle 
APPLICANT: Diederich, Robert J. 
APPLICANT: Xu, Tian 
APPLICANT: Matsuno, Kenji 

TITLE OF INVENTION: DELTEX PROTEINS, NUCLEIC ACIDS, AND 
TITLE OF INVENTION: ANTIBODIES, AND RELATED METHODS AND COMPOSITIONS 
NUMBER OF SEQUENCES: 23 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: PENNIE & EDMONDS 
STREET: 1155 Avenue of the Americas 
CITY: New York 
STATE: New York 
COUNTRY: U.S.A. 
ZIP: 10036-2711 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS /MS-DOS 

SOFTWARE: Patentln Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/185, 432 
FILING DATE: 21-JAN-1994 
CLASSIFICATION: 530 
ATTORNEY/AGENT INFORMATION: 
NAME: Misrock, S. Leslie 
REGISTRATION NUMBER: 18,872 
REFERENCE/ DOCKET NUMBER: 7326-006 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: (212) 790-9090 
TELEFAX: (212) 869-8864/9741 
TELEX: 66141 PENNIE 
INFORMATION FOR SEQ ID NO: 16: 



; SEQUENCE CHARACTERISTICS: 

; LENGTH: 2471 amino acids 

; TYPE: amino acid 

; TOPOLOGY: unknown 

MOLECULE TYPE: protein 
US-08-185-432-16 

Query Match 18.5%; Score 666.5; DB 1; Length 2471; 

Best Local Similarity 25.8%; Pred. No. 7.1e-35; 

Matches 225; Conservative 74; Mismatches 250; Indels 323; Gaps 57; 

Qy 3 ISLNSCLSFICL LLCH WIGTASPLNLE--DPNVCSHW ES 39 

I : : I I I I : I : I | : : | J : | | : | 

Db 531 IDIDDCSSTPCLNGAKCIDHPNGYECQCATGFTGVLCEENIDNCDPDPCHHGQCQDGIDS 590 

Qy 4 0 YSVTVQESYPHPF— DQI--YYTS CTDILNWFKC TRHRVSYRTAY 8 0 

I : I III I : I | | : : | : : | : : : 

Db 591 YTCICNPGYMGAICSDQIDECYSSPCLNDGRCIDLVNGYQCNCQPGTSGVNCEINFDDCA 650 

Qy 81 R HG — EKTMYRRKSQCCPGF YESGEMCV 10 6 

II : I I I II ||: 

Db 651 SNPCIHGICMDGINRYSCVCSPGFTGQRCNIDIDECASNPCRKGATCINGVNGFRCICPE 710 

Qy 107 PHC ADKCVHGRC IAPNTCQCEPGWGGTNCSSACDGDHWGPHCTSR 151 

M : : | : | | I : : I h II I II I : II 
Db 711 GPHHPSCYSQVNECLSNPCIHGNCTGGLSGYKCLCDAGWVGINCE — VDKN ECLSN 7 64 

Qy 152 CQCKNGALC-NPITG-ACHCAAGFRGWRCE DRC EQGTYGNDCH-QRCQC — 197 

I : I I I I : I II I I : I : I : II III : | | | 

Db 765 -PCQNGGTCDNLVNGYRCTCKKGFKGYNCQVNIDECASNPCLNQGTCFDDISGYTCHCVL 823 

Qy 198 -QNGATCDHVTGECRCPPGYTGAFCED LCPPGKHGPQCE QRC PCQ 241 



Db 



iii i i ii ii i • i i i i 

82 4 PYTGKNCQTVLAPCSPNPCENAAVCKESPNFESYTCLCAPGWQGQRCTIDIDECISKPCM 8 83 




Db 



Qy 



242 NGGVCHHVTGE — CSCPSGWMGTVCGQPCPEGRFGKNCSQEC QCHNGGTC — DAATG 294 

I I : I I : I I I I ( : I I : : I I I I I : I | 

884 NHGLCHNTQGSYMCECPPGFSGMDCEEDI DDCLANPCQNGGSCMDGWTF 933 



QY 



Db 



2 95 QCHCSPGYTGERCQ DECPVGTYGVLCAETCQCVNGGKC--YHVSGACLCEAGFAGER 34 9 

I I I I : I I : : I I : I I : I 1 : I I I I I I I I : I I I I 

934 SCLCLPGFTGDKCQTDMNEC LSEPCK — NGGTCSDYVNSYTCKCQAGFDGVH 983 



Qy 



Db 



350 CEARL CPEGLYGIKCDKRCP CHLENTHSCHPMSGECACK 388 

M: I = I : I II I I I I I II 

98 4 CENNINECTESSCFNGGTCVDGINSFSC— LCPVGFTGSFCLHEINECSSHPCLNEGTCV 1041 



Qy 



389 PGWSGLYCNETCSPGFYGEACQ QICS CQNGADC--DSVTGKCTCAPGFKGIDCS 44 0 



Db 



i i • i i • i • i i «ii I • i i -ii i • i i 

1042 DGLGTYRC--SCPLGYTGKNCQTLVNLCSRSPCKNKGTCVQKKAESQCLCPSGWAGAYCD 1099 



Qy 



Db 




Qy 



4 60 NDAVCSPVDGS — CTCKAGWHGVDCSIR- 
: I I I I II I : I I : I 



CPSGTWG 4 92 
Mill 



Db 



1160 HGATCSDFIGGYRCECVPGYQGVNCEYEVDECQNQPCQNGGTCIDLVNHFKCSCPPGTRG 1219 



493 FGC--NL-TC QCLNGGACNTLDG — TCTCAPGWRGEKCE LPC-QDGTY 534 



Db 




Qy 



535 GLNCAE RCDCSHA DGC HPTTGHCRCLPGW 563 



Db 



127 9 -LDCIQLTNDYLCVCRSAFTGRHCETFVDVCPQMPCLNGGTCAVASNMPDGFICRCPPGF 1337 



Qy 



564 SGVHCDSVCAEGRW GPNCSLP 584 



Db 



1338 SGARCQSSCGQVKCRKGEQCVHTASGPRCFCP 1369 



RESULT 13 
US-08-083-590A-19 

; Sequence 19, Application US/08083590A 

; Patent No. 5786158 

; GENERAL INFORMATION: 

; APPLICANT: Artavanis -Tsakonas , S. et al . 

TITLE OF INVENTION: Therapeutic And Diagnostic Methods 

TITLE OF INVENTION: And Compositions Based On No. 5786158ch Proteins And 
TITLE OF INVENTION: Nucleic Acids 
NUMBER OF SEQUENCES: 21 
; CORRESPONDENCE ADDRESS: 

ADDRESSEE: Pennie & Edmonds 
STREET: 1155 Avenue of the Americas 
; CITY: New York 

STATE: New York 
COUNTRY: U.S.A. 
ZIP: 10036 
; COMPUTER READABLE FORM: 

; MEDIUM TYPE: Floppy disk 

COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Patentln Release #1.0, Version #1.25 
; CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/083 , 590A 

FILING DATE: 25-JUN-1993 
; CLASSIFICATION: 435 

ATTORNEY/ AGENT INFORMATION: 

NAME: Misrock, S. Leslie 

REGISTRATION NUMBER: 18,872 

REFERENCE/ DOCKET NUMBER: 7 32 6-015 
TELECOMMUNICATION INFORMATION: 
; TELEPHONE: 212 790-9090 

; TELEFAX : 212 8698864/9741 

; TELEX: 66141 PENNIE 

; INFORMATION FOR SEQ ID NO: 19: 
SEQUENCE CHARACTERISTICS: 

LENGTH: 2471 amino acids 

TYPE: amino acid 

STRANDEDNESS: single 
; TOPOLOGY: unknown 

MOLECULE TYPE: peptide 
US-08-083-590A-19 



Query Match 18.5%; Score 666.5; DB 1; Length 2471; 

Best Local Similarity 25.8%; Pred. No. 7.1e-35; 

Matches 225; Conservative 74; Mismatches 250; Indels 323; Gaps 57; 

Qy 3 ISLNSCLSFICL LLCH WIGTASPLNLE — DPNVCSHW ES 39 

I : : I I I I : I : I I : : I I : I I : I 

Db 531 IDIDDCSSTPCLNGAKCIDHPNGYECQCATGFTGVLCEENIDNCDPDPCHHGQCQDGIDS 590 

Qy 4 0 YSVTVQESYPHPF — DQI--YYTS CTDILNWFKC TRHRVSYRTAY 8 0 

I : I III I : I | | : : | : : | : : : 

Db 591 YTCICNPGYMGAICSDQIDECYSSPCLNDGRCIDLVNGYQCNCQPGTSGVNCEINFDDCA 650 

Qy 81 R HG--EKTMYRRKSQCCPGF YESGEMCV 106 

II : I I I I I II: 

Db 651 SNPCIHGICMDGINRYSCVCSPGFTGQRCNIDIDECASNPCRKGATCINGVNGFRCICPE 710 

Qy 107 PHC ADKCVHGRC IAPNTCQCEPGWGGTNCSSACDGDHWGPHCTSR 151 

II : : | : | | | : : I h III II I : II 
Db 711 GPHHPSCYSQVNECLSNPCIHGNCTGGLSGYKCLCDAGWVGINCE--VDKN ECLSN 764 

Qy 152 CQCKNGALC-NPITG-ACHCAAGFRGWRCE DRC EQGTYGNDCH-QRCQC — 197 

1:11 I I : I II I I : I : I : II III : I II 

Db 765 -PCQNGGTCDNLVNGYRCTCKKGFKGYNCQVNIDECASNPCLNQGTCFDDISGYTCHCVL 823 

Qy 198 -QNGATCDHVTGECRCPPGYTGAFCED LCPPGKHGPQCE QRC PCQ 241 

III I I II:: I I I I I : I I II 

Db 824 PYTGKNCQTVLAPCSPNPCENAAVCKESPNFESYTCLCAPGWQGQRCTIDIDECISKPCM 8 83 

Qy 242 NGGVCHHVTGE--CSCPSGWMGTVCGQPCPEGRFGKNCSQEC QCHNGGTC — DAATG 294 

I I : I I : I I I I I : I I : : I I I I I : I I 

Db 884 NHGLCHNTQGSYMCECPPGFSGMDCEEDI DDCLANPCQNGGSCMDGVNTF 933 

Qy 295 QCHCSPGYTGERCQ DECPVGTYGVLCAETCQCVNGGKC-- YHVSGACLCEAGFAGER 34 9 

I I II : I I : : I I : I I : I I : I I I I I I I I : I I I I 

Db 934 SCLCLPGFTGDKCQTDMNEC LSEPCK — NGGTCSDYVNS YTCKCQAGFDGVH 983 

Qy 350 CEARL CPEGLYGIKCDKRCP CHLENTHSCHPMSGECACK 38 8 

II: I : I : III I I I II I I 

Db 984 CENNINECTESSCFNGGTCVDGINSFSC — LCPVGFTGSFCLHEINECSSHPCLNEGTCV 1041 

Qy 389 PGWSGLYCNETCSPGFYGEACQ QICS CQNGADC— DSVTGKCTCAPGFKGIDCS 440 

I I : I I : I : I I : I I I : I I : I I I : I I 

Db 1042 DGLGTYRC--SCPLGYTGKNCQTLVNLCSRSPCKNKGTCVQKKAESQCLCPSGWAGAYCD 1099 

Qy 441 TP CPLGTYGINCSSR CG CK 459 

I I I I I I I : I I : 

Db 1100 VPNVSCDIAASRRGVLVEHLCQHSGVCINAGNTHYCQCPLGYTGSYCEEQLDECASNPCQ 1159 

Qy 4 60 NDAVCSPVDGS--CTCKAGWHGVDCSIR CPSGTWG 4 92 

: I I I I I I I : I I : I II II I 

Db 1160 HGATCSDFIGGYRCECVPGYQGVNCEYEVDECQNQPCQNGGTCIDLVNHFKCSCPPGTRG 1219 

Qy 4 93 FGC--NL-TC QCLNGGACNTLDG--TCTCAPGWRGEKCE LPC-QDGTY 534 

I I : I I I I I 1 I I : I I II : I I : I I II : I : 

Db 122 0 LLCEENIDDCARGPHCLNGGQCMDRIGGYSCRCLPGFAGERCEGDINECLSNPCSSEGS- 12 7 8 



Qy 



535 GLNCAE RCDCSHA DGC HPTTGHCRCLPGW 563 



Db 



127 9 -LDCIQLTNDYLCVCRSAFTGRHCETFVDVCPQMPCLNGGTCAVASNMPDGFICRCPPGF 1337 



Qy 



564 SGVHCDSVCAEGRW GPNCSLP 584 



Db 



1338 SGARCQSSCGQVKCRKGEQCVHTASGPRCFCP 1369 



RESULT 14 
US-08-532-384-19 

; Sequence 19, Application US/08532384 
; Patent No. 6083904 

GENERAL INFORMATION: 
; APPLICANT: Artavanis-Tsakonas , S. et al . 

; TITLE OF INVENTION: Therapeutic And Diagnostic Methods 

TITLE OF INVENTION: And Compositions Based On No. 6083904ch Proteins And 
TITLE OF INVENTION: Nucleic Acids 
NUMBER OF SEQUENCES: 21 
; CORRESPONDENCE ADDRESS: 

ADDRESSEE: Pennie & Edmonds 
STREET: 1155 Avenue of the Americas 
CITY: New York 
; STATE: New York 

COUNTRY: U.S.A. 
ZIP: 10036 
; COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 
; SOFTWARE: Patentln Release #1.0, Version #1.25 

CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/ 0 8/532 , 384 
FILING DATE: 
CLASSIFICATION: 42 4 
PRIOR APPLICATION DATA: 
; APPLICATION NUMBER: 08/083,590 

; FILING DATE: 25-JUN-1993 

ATTORNEY/AGENT INFORMATION: 
NAME : Misrock, S. Leslie 
REGISTRATION NUMBER: 18,872 
; REFERENCE/ DOCKET NUMBER: 7326-015 

TELECOMMUNICATION INFORMATION: 
TELEPHONE: 212 790-9090 
TELEFAX: 212 8698864/9741 
TELEX: 66141 PENNIE 
; INFORMATION FOR SEQ ID NO: 19: 

SEQUENCE CHARACTERISTICS: 
; LENGTH: 2471 amino acids 

; TYPE: amino acid 

STRANDEDNESS: single 
TOPOLOGY: unknown 
MOLECULE TYPE: peptide 
US-08-532-384-19 



Query Match 18.5%; Score 666.5; DB 3; Length 2471; 

Best Local Similarity 25.8%; Pred. No. 7.1e-35; 



Matches 225; Conservative 74; Mismatches 250; Indels 323; Gaps 57; 



Qy 3 ISLNSCLSFICL LLCH WIGTASPLNLE — DPNVCSHW ES 39 

I : : I I I I : I : I I : : I I : I I : I 

Db 531 IDIDDCSSTPCLNGAKCIDHPNGYECQCATGFTGVLCEENIDNCDPDPCHHGQCQDGIDS 590 

Qy 4 0 YSVTVQESYPHPF — DQI — YYTS CTDILNWFKC TRHRVSYRTAY 8 0 

I : I I I I I : I | | : : | : : | : : : 

Db 591 YTCICNPGYMGAICSDQIDECYSSPCLNDGRCIDLVNGYQCNCQPGTSGVNCEINFDDCA 650 

Qy 81 R HG--EKTMYRRKSQCCPGF YESGEMCV 106 

II : I I I I I II: 

Db 651 SNPCIHGICMDGINRYSCVCSPGFTGQRCNIDIDECASNPCRKGATCINGVNGFRCICPE 710 

Qy 107 PHC ADKCVHGRC IAPNTCQCEPGWGGTNCSSACDGDHWGPHCTSR 151 

II : : | : M | : : I I : I I I M I : II 
Db 711 GPHHPSCYSQVNECLSNPCIHGNCTGGLSGYKCLCDAGWVGINCE- — VDKN ECLSN 764 

Qy 152 CQCKNGALC-NPITG-ACHCAAGFRGWRCE DRC EQGTYGNDCH-QRCQC — 197 

I : I I I I : I I I I I : I : I : II I I I : I II 

Db 765 -PCQNGGTCDNLWGYRCTCKKGFKGYNCQVNIDECASNPCLNQGTCFDDI SGYTCHCVL 823 

Qy 19 8 -QNGATCDHVTGECRCPPGYTGAFCED LCPPGKHGPQCE QRC PCQ 241 

ill I I II:: IIM!:I I II 

Db 82 4 PYTGKNCQTVLAPCSPNPCEN7UVVCKESPNFESYTCLCAPGWQGQRCTIDIDECISKPCM 8 83 

Qy 242 NGGVCHHVTGE — CSCPSGWMGTVCGQPCPEGRFGKNCSQEC QCHNGGTC--DAATG 294 

I I : I I : I I I I I : I I : : I I I II : I I 

Db 884 NHGLCHNTQGSYMCECPPGFSGMDCEEDI DDCLANPCQNGGSCMDGVNTF 933 

Qy 295 QCHCSPGYTGERCQ DECPVGTYGVLCAETCQCVNGGKC — Y H V S G AC L C E AG FAG E R 349 

I I I I : I I : : I I : I I : I I : I I I I I I I I : M I I 

Db 934 SCLCLPGFTGDKCQTDMNEC LSEPCK--NGGTCSDYVNSYTCKCQAGFDGVH 983 

Qy 350 CEARL CPEGLYGIKCDKRCP CHLENTHSCHPMSGECACK 388 

I I : I : I : III I I I I I I I 

Db 984 CENNINECTESSCFNGGTCVDGINSFSC — LCPVGFTGS FCLHEINECSSHPCLNEGTCV 1041 

Qy 389 PGWSGLYCNETCSPGFYGEACQ QICS CQNGADC--DSVTGKCTCAPGFKGIDCS 440 

I | : | | : | : | | : | | | : | | : I I I : I I 

Db 1042 DGLGTYRC--SCPLGYTGKNCQTLVNLCSRSPCKNKGTCVQKKAESQCLCPSGWAGAYCD 1099 

Qy 441 TP CPLGTYGINCSSR CG CK 459 

I I I I I I I : I 1 : 

Db 1100 VPNVSCDIAASRRGVLVEHLCQHSGVCINAGNTHYCQCPLGYTGSYCEEQLDECASNPCQ 1159 

Qy 4 60 NDAVCSPVDGS--CTCKAGWHGVDCSIR CPSGTWG 4 92 

: I I I I I I I : I I : I I I I I I 

Db 1160 HGATCSDFIGGYRCECVPGYQGVNCEYEVDECQNQPCQNGGTCIDLVNHFKCSCPPGTRG 1219 

Qy 493 FGC--NL-TC QCLNGGACNTLDG — TCTCAPGWRGEKCE LPC-QDGTY 534 

I I : I I I I I I I I : | I I I : | | : 1 | II : I : 

Db 1220 LLCEENIDDCARGPHCLNGGQCMDRIGGYSCRCLPGFAGERCEGDINECLSNPCSSEGS- 1278 

Qy 535 GLNCAE — RCDCSHA DGC HPTTGHCRCLPGW 563 

I : I : III II I Mill: 

Db 1279 -LDCIQLTNDYLCVCRSAFTGRHCETFVDVCPQMPCLNGGTCAVASNMPDGFICRCPPGF 1337 



Qy 564 SGVHCDSVCAEGRW GPNCSLP 584 

Mill:: I I I I 

Db 1338 SGARCQSSCGQVKCRKGEQCVHTASGPRCFCP 1369 



RESULT 15 
US-08-899-232-1 

; Sequence 1, Application US/08899232 
; Patent No. 6436650 
; GENERAL INFORMATION: 

APPLICANT: Artavani s-Tsakonas , Spyridon 
; APPLICANT: Qi, Huilin 

TITLE OF INVENTION: ACTIVATED FORMS OF NOTCH AND METHODS BASED THEREON 
; FILE REFERENCE: 7326-046 

; CURRENT APPLICATION NUMBER: US/ 08/ 8 99 , 232 

; CURRENT FILING DATE: 1997-07-23 

; NUMBER OF SEQ ID NOS: 4 

; SOFTWARE: PatentlnVer. 2.0 

; SEQ ID NO 1 

LENGTH: 2471 

TYPE: PRT 
; ORGANISM: Homo sapiens 
US-08-899-232-1 



Query Match 18.5%; Score 666.5; DB 4; Length 2471; 

Best Local Similarity 25.8%; Pred. No. 7.1e-35; 

Matches 225; Conservative 74; Mismatches 250; Indels 323; Gaps 57; 



Qy 


3 


ISLNSCLSFICL LLCH WI GTASPLNLE--DPNVCSHW ES 

I : : 1 1 1 1 : 1 : 1 1 : : 1 1 : 1 1 : 1 

IDIDDCSSTPCLNGAKCIDHPNGYECQCATGFTGVLCEENIDNCDPDPCHHGQCQDGIDS 


39 


Db 


531 


590 


Qy 


40 


YSVTVQESYPHPF — DQI — YYTS CTDILNWFKC TRHRVSYRTAY 

1 : 1 III 1 : 1 | | : : | : : | : : : 
YTCICNPGYMGAICSDQIDECYSSPCLNDGRCIDLVNGYQCNCQPGTSGVNCEINFDDCA 


80 


Db 


591 


650 


Qy 


81 


R HG--EKTMYRRKSQCCPGF YESGEMCV 


106 


Db 


651 


II : 1 1 1 II IN 
SNPCIHGICMDGINRYSCVCSPGFTGQRCNIDI DECASNPCRKGATCINGVNGFRCI CPE 


710 


Qy 


107 


PHC ADKCVHGRC IAPNTCQCEPGWGGTNCSSACDGDHWGPHCTSR 

II : : 1 : 1 1 1 :: 1 h II 1 II 1 : 1 1 

GPHHPSCYSQVNECLSNPCIHGNCTGGLSGYKCLCDAGWVGINCE — VDKN ECLSN 


151 


Db 


711 


764 


Qy 


152 


CQCKNGALC-NPITG-ACHCAAGFRGWRCE DRC EQGTYGNDCH-QRCQC- - 

1 : 1 1 1 1 : 1 II 1 1 : 1 : 1 : II III : 1 1 1 
-PCQNGGTCDNLVNGYRCTCKKGFKGYNCQVNIDECASNPCLNQGTCFDDISGYTCHCVL 


197 


Db 


7 65 


823 


Qy 


198 


-QNGATCDHVTGECRCPPGYTGAFCED LCPPGKHGPQCE QRC — -PCQ 

III 1 1 II:: 1 1 1 1 1 : 1 1 II 
PYTGKNCQTVLAPCSPNPCENAAVCKESPNFES YTCLCAPGWQGQRCTIDIDECISKPCM 


241 


Db 


824 


883 


Qy 


242 


NGGVCHHVTGE— CSCPSGWMGTVCGQPCPEGRFGKNCSQEC QCHNGGTC--DAATG 

1 1 : 1 i : 1 1 1 1 1 : 1 1 : : 1 1 1 1 1 : 1 1 
NHGLCHNTQGSYMCECPPGFSGMDCEEDI DDCLANPCQNGGSCMDGVNTF 


294 


Db 


884 


933 



Qy 2 95 QCHCSPGYTGERCQ — DECPVGTYGVLCAETCQCVNGGKC— YHVSGACLCEAGFAGER 34 9 

I I I 1 : I I : : I I : I I : I I : I I i : I I I I 

Db 934 SCLCLPGFTGDKCQTDMNEC -LSEPCK— NGGTCSDYVNSYTCKCQAGFDGVH 983 

Qy 350 CEARL CPEGLYGIKCDKRCP CHLENTHSCHPMSGECACK 388 

M : I : I : III 

Db 984 CENNINECTESSCFNGGTCVDGINSFSC--LCPVGFTGSFCLHEINECS SHPCLNEGTCV 1041 

Qy 389 PGWSGLYCNETCSPGFYGEACQ— -QICS — CQNGADC— DSVTGKCTCAPGFKGIDCS 440 

I | : | | : | : I I : I I I : I I : I I I : I I 

Db 1042 DGLGTYRC— SCPLGYTGKNCQTLVNLCSRSPCKNKGTCVQKKAESQCLCPSGWAGAYCD 1099 

q y 441 TP CPLGTYGINCSSR CG CK 459 

| MM I I : I I: 

Db 1100 VPNVSCDIAASRRGVLVEHLCQHSGVCINAGNTHYCQCPLGYTGSYCEEQLDECASNPCQ 1159 

Qy 460 NDAVCSPVDGS — CTCKAGWHGVDCS IR CPSGTWG 492 

: I I I I I I I : I I : I Mill 
Db 1160 HGATCS DFI GGYRCECVPGYQGVNCEYEVDECQNQPCQNGGTCI DLVNHFKCS CPPGTRG 1219 

Qy 4 93 FGC— NL-TC QCLNGGACNTLDG--TCTCAPGWRGEKCE LPC-QDGTY 534 

I I : I I I I I I I I : I I IM II M I MM: 

Db 1220 LLCEENIDDCARGPHCLNGGQCMDRIGGYSCRCLPGFAGERCEGDINECLSNPCSSEGS- 1278 

Qy 535 GLNCAE RCDCSHA DGC HPTTGHCRCLPGW 563 

Ml: III II I Mill: 

Db 1279 -LDCIQLTNDYLCVCRSAFTGRHCETFVDVCPQMPCLNGGTCAVASNMPDGFICRCPPGF 1337 

Qy 564 SGVHCDSVCAEGRW GPNCSLP 584 

Mill:: I I I I 

Db 1338 SGARCQSSCGQVKCRKGEQCVHTASGPRCFCP 1369 



Search completed: March 26, 2004, 16:13:10 
Job time : 18.241 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM protein - protein search,, using sw model 
Run on: 



Title: 

Perfect score: 
Sequence : 

Scoring table: 



March 26, 2004, 16:05:25 ; Search time 11.883 Seconds 

(without alignments) 
4743.616 Million cell updates/sec 

US-10-092-390-4 
3601 

1 MVISLNSCLSFICLLLCHWI HCDSVCAEGRWGPNCSLPCY 586 

BLOSUM62 

Gapop 10.0 , Gapext 0.5 



Searched: 283366 seqs, 96191526 residues 

Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 10 0% 
Listing first 45 summaries 



283366 



Database 



PIRJ78 : * 
p i r 1 : * 
pir2 : * 
pir3 : * 
pir4 : * 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 
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notch protein homo 


16 


672.5 


18, 


.7 


4135 


2 


T42629 


tenascin-X - bovin 


17 


664.5 


18, 


.5 


1064 


2 


A40136 


fibropellin la - s 


18 


659. 5 


18, 


.3 


2555 


2 


A40043 


notch protein homo 


19 


658 


18. 


. 3 


1964 


2 


T09059 


notch4 - mouse 


20 


648. 5 


18. 


. 0 


2352 


2 


T30201 


Notch homolog prot 


21 


644.5 


17. 


. 9 


2201 


2 


A32160 


tenascin-C ~ human 


22 


64 0. 5 


17. 


. 8 


2019 


1 


JQ1322 


tenascin precursor 


23 


636 


17. 


,7 


1810 


1 


A32230 


tenascin precursor 


24 


631.5 


17. 


, 5 


2139 


2 


A35672 


crumbs protein - f 


25 


616 


17. 


, 1 


1220 


2 


A56136 


jagged protein pre 


26 


611 


17. 


, 0 


1746 


1 


S19694 


tenascin precursor 


27 


593.5 


16. 


,5 


3672 


2 


T23433 


hypothetical prote 


28 


593.5 


16. 


, 5 


3704 


2 


T37316 


probable laminin a 


29 


587 


16. 


, 3 


1408 


2 


S16148 


gene serrate prote 


30 


586 


16. 


, 3 


1722 


2 


E89753 


protein F11C7.4 [i 


31 


586 


16. 


, 3 


3635 


2 


T10053 


laminin alpha 5 ch 


32 


580. 5 


16. 


, 1 


861 


2 


A48825 


Notch homolog Mote 


33 


577.5 


16. 


, 0 


1801 


1 


MMRTS 


laminin beta-2 cha 


34 


576. 5 


16. 


, 0 


3712 


2 


S18253 


laminin alpha-1 ch 


35 


565. 5 


15. 


,7 


2823 


2 


T23064 


hypothetical prote 


36 


565.5 


15. 


,7 


2823 


2 


F87908 


protein T22A3 . 8 [i 


37 


565. 5 


15. 


, 7 


3102 


2 


T43291 


laminin alpha chai 


38 


562.5 


15. 


, 6 


647 


2 


A43902 


tenascin - eastern 


39 


561 


15. 


, 6 


1798 


2 


S53869 


laminin beta-2 cha 


40 


561 


15. 


, 6 


3106 


1 


S53868 


laminin alpha-2 ch 


41 


560. 5 


15. 


, 6 


1429 


2 


S06434 


homeotic protein 1 


42 


556 


15. 


, 4 


833 


2 


S19087 


gene Delta protein 


43 


552 


15. 


, 3 


832 


2 


A31246 


neurogenic protein 


44 


552 


15. 


, 3 


8 80 


2 


S00670 


neurogenic repetit 


45 


550 


15. 


, 3 


1797 


2 


A55677 


laminin beta-2 cha 



ALIGNMENTS 



RESULT 1 
T13954 

MEGF6 protein - rat 

C; Species: Rattus norvegicus (Norway rat) 

C;Date: 20-Sep-1999 #seguence_revision 20-Sep-1999 #text_change 21-Jul-2000 
C; Accession: T13 9 54 

R;Nakayama, M. ; Nakajima, D . ; Nagase, T . ; Nomura, N. ; Seki, N. ; Ohara, O. 
Genomics 51, 27-34, 1998 

A;Title: Identification of high-molecular-weight proteins with multiple EGF-like 
motifs by motif-trap screening. 

A;Reference number: Z14126; MUID : 98 36008 9 ; PMID:9693030 
A; Access ion: T13 954 

A; Status: preliminary; translated from GB/EMBL/DDBJ 
A;Molecule type: mRNA 
A; Residues: 1-1574 <NAK> 

A;Cross-references: EMBL : AB011532 ; NID : g344 92 93 ; PIDN : BAA32 4 62.1; PID:g3449294 
A; Experimental source: strain Sprague-Dawley ; brain 
C ; Genetics : 
A; Gene: MEGF6 



Query Match 



38.1%; Score 1372.5; DB 2; Length 1574; 



Best Local Similarity 42.2-° 5 ; Pred. No. 1.6e-68; 

Matches 230; Conservative 60; Mismatches 199; Indels 56; Gaps 





R Q 


RRKSQCCPGFYESGEMCVPHCADKCVH-GRCIAPNT-— CQCEPGWGGTNCSSACDGDHWG 

I : | | : | : | i II 1 : 1 1 II III: III 1 1 1 

ROODTCS AGWYGTG — COT RCA — CANDGHC-DPTTGRCSCAPGWTGLSCORACDSGHWG 


145 




U1U 


870 




14 6 


PHCTSRCQCKNG-ALCNPITGACHCAAGFRGWRCEDRCEQGTYGNDCHQRCQCQNGATCD 

I | II | 1 : : : 1 1 1 1 1 : 1 1 1 1 1 1 1 1 1 : : : : | 
PDCIHPCNCSAGHGNCDAVSGLCLCEAGYEGPRCEQSCRQGYYGPSCEQKCRCEHGAACD 


204 


Db 


871 


930 




205 


HVTGECRCPPGYTGAFCED LCPPGKHGPQC 

1 1 : 1 1 i 1 1 : 1 : 1 1 1 : | | | : | | : | 
m/SC AnTOP ACYaJR^S FOFTTAOP AfTFFOLDOT) S ACTIOS Afi APCDAVTGSC TCP AGRWGPRC 


234 


UD 


r> o 1 


990 


Pit 7 

Qy 


9 "3 R 
Z j J 


EQRCP CQNGGVCHHVTGECSCPSGWMGTVCGQPCPEGRFGKNCSQEC 

III III 1 Mhll MM 11111:1111 1 
an^TPPT tfpj MO snTOTVFNOASODSVTOOOHO APOl^OPTOLOACPPGLYGKNCOHSC 


281 


UD 


QQ1 


1050 


qy 


9 P 9 


QCHNGGTCDAATGQCHCSPGYTGERCQDECPVGTYGVLCAETCQCVNGGKCYHVSGACLC 
1 1 II 1 1 1 1 1 1 1 : II 1 : : 1 1 II 1 1 1 : : 1 1 1 "INI 

T rRMnr^RrriPTT,r;OCTCPFCWTCT,ACFNRCT,PGHYAAGCOT,NCSCLHGGICDRLTGHCLC 


341 


DD 


i 

i U D 1 


1110 


Qy 


o ft Z 


EAGFAGERCEARLCPEGLYGIKCDKRCPCHLENTHSCHPMSGECACKPGWSGLYCNETCS 

1 I : I : : 1 : : 1 1 : 1 : 1 : : 1 1 1 1 1 : : I 1 1 1 1 1 1 : 1 : 1 

parwrnnKrn^ s-nvsr^TFOVHOFFHOAO — rkoaschhvtgaofcppgwrgphceoacp 


401 


UD 


1111 
1111 


1167 


yy 


d fl9 


PnFYr^FACOOTC^CON^ADCnSVTGKCTCAPGFKGTDCSTPCPLGTYGINCSSRCGCKND 

1 : : I I 1 1 1 1 1 II 1 1 1 : 1 1 1 1 1 1 : 1 1 1 1 : 1 : 1 II = 

RfiWFRFArAORrT,CPTNASCHHVTGECRCPPGFTGLSCE0AC0PGTFGKDCEHLC0CPGE 


461 


ULJ 




1227 


Qy 


A 69 

T Oil, 


A-VC ^PVFjG^CT CKACWHCVDC STROP SGTWGFGCNLTCOCLNGGACNTLDGTCT CAP GW 
i 1 I III 1 1 : 1 1 1 11111:111 1 : 1 1 1 II 1 : III 1 : 
TWACDPAS GVCTC7\AGYHGTGCLQRCPSGRYGPGCEHICKCLNGGTCDPATGACYCPAGF 


520 


Db 


1228 


1287 


Qy 


521 


RGEKCELPCQDGTYGLNCAERCDCSHADGCHPTTGHCRCLPGWSGVHCDSVCAEGRWGPN 
1 Ml 1 : 1 : II II 1 1 : 1 1 1 1 1 : 1 1 1 : 1 : 1 : 1 


580 


Db 


1288 


LGADCSLACPQGRFGPSCAHVCACRQGAACDPVSGACICSPGKTGVRCEHGCPQDRFGKG 


1347 


Qy 


581 


CSLPC 585 




Db 


1348 


1 1 1 

CELKC 1352 





RESULT 2 
T27283 

hypothetical protein Y64G10A.f - Caenorhabditis elegans 
C; Species: Caenorhabditis elegans 

C;Date: 15-Oct~1999 #sequence_revision 15-Oct-1999 #text_change 15-Oct-1999 
C;Accession: T27283 
R ; Ains cough , R. 

submitted to the EMBL Data Library, September 1999 
A;Reference number: Z20336 
A;Accession: T27283 

A; Status: preliminary; translated from GB/EMBL/DDBJ 

A;Molecule type: DNA 

A; Residues: 1-1620 <WIL> 



A;Cross-references : EMBL: AL110498 ; NID : el542303; PIDN : CAB5447 1 . 1 ; CES P : Y64G1 OA. f 
A; Experimental source: clone Y64G10A 
C; Genetics : 

A; Gene: CESP : Y64G1 OA. f 

A;Introns: 77/1; 116/1; 198/1; 282/1; 365/1; 425/1; 466/1; 548/1; 559/1; 601/1; 

625/1; 715/1; 782/3; 845/1; 895/2; 956/1; 1105/1; 1221/1; 1307/1; 1445/2 

Query Match 36.7%; Score 1322; DB 2; Length 1620; 

Best Local Similarity 36.3%; Pred. No. 9.9e-66; 

Matches 235; Conservative 76; Mismatches 234; Indels 102; Gaps 12; 

Qy 16 LCHWI -GTASPLNLEDPNVCSHW ESYS VTVQESYPHPFDQI YYTSCTDIL 64 

: I I : I I : I : : I : : I : I I : : : I : : 

Db 872 VCHHVTGTCTCLPGKTGPLCDQCLIFVETIEFDIAFSINVIACAPNTYGPNCAHTCS-CV 930 

Qy 65 NWFKCTRHRVSYRTAYRHGEKTMYRRKSQCCPGFYESGEMCVPHCAD KC 113 

III I I I I I I I I I II 

Db 931 NGAKCDESDGS CHCTPGFY — GATCSEVCPTGRFGIDCMQLCKC 972 

Qy 114 VHGR-CIAPN-TCQCEPGWGGTNCSSACDGDHWGPHCTSRCQCKNGALCNPITGACHCAA 171 

: I I I :|:| III I I II : I I : : I I : I I : I I I I 

Db 97 3 QNGAICDTSNGSCECAPGWSGKKCDKACAPGTFGKDCSKKCDCADGMHCDPSDGECICPP 1032 

Qy 17 2 GFRGWRCEDRCEQGTYGNDCHQRCQCQNGAT 2 02 

I : I : I : : I : I : I I I I I I I I I 
Db 1033 GKKGHKCDETCDSGLFGAGCKGICSCQNGATCDSVTGSCECRPGWRGKKCDRPCPDGRFG 1092 

Qy 203 CDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRC 238 

I I I I I I I I I I I I : I I I : I I I : I I I I 

Db 1093 EGCNAICDCTTTNDTSMYNPFVARCDHVTGECRCPAGWTGPDCQTSCPLGRHGEGCRHSC 1152 

Qy 239 PCQNGGVCHHVTGECSCPSGWMGTVCGQPCPEGRFGKNCSQECQCHNGGTCDAATGQCHC 2 98 

I I I I t I I I I I I I : I I I I I I I : I I I : I I : I I I : III 

Db 1153 QCSNGASCDRVTGFCDCPSGFMGKNCESECPEGLWGSNCMKHCLCMHGGECNKENGDCEC 1212 

Qy 299 SPGYTGERCQDECPVGTYGVLCAETCQCVNGGKCYHVSGACLCEAGFAGERCEARLCPEG 358 

I : I I I I I : I I I : I I I I I : I I I I : : I I M : I I 

Db 1213 IDGWTGPSL CPFGQFGRNCAQRCNCKNGASCDRKTGRCECLPGWSGEHCE-KSCVSG 1268 

Qy 359 LYGIKCDKRCPCHLENTHSCHPMSGECACKPGWSGLYCNETCSPGFYGEACQQICSCQNG 418 

II II:: I I II I I : I I I : I : I I I I II I I : : I I I I I I 

Db 12 69 HYGAKCEETCEC--ENGALCDPISGHCSCQPGWRGKKCNRPCLKGYFGRHCSQSCRCANS 132 6 

Qy 419 ADCDSVTGKCTCAPGFKG1DCSTPCPLGTYGINCSSRCGCKNDAVCS PVDGSCTCKAGWH 478 

I I : : I : I I I : I I : I I I I : I : I I : I I : : : I : I I I I I 

Db 1327 KSCDHISGRCQCPKGYAGHSCTELCPDGTFGESCSQKCDCGENSMCDAISGKCFCKPGHS 1386 

Qy 479 GVDCSIRCPSGTWGFGCNLTCQCLNGGACNTLDGTCTCAPGWRGEKCELPCQDGTYGLNC 538 

III I I : I II | | I | I I : : I : I I II : I I I i : I I : I I 

Db 1387 GSDCKSGCVQGRFGPDCNQLCSCENGGVCDSSTGSCVCPPGYIGTKCEIACQSDRFGPTC 1446 

Qy 539 AERCDCSHADGCHPTTGHCRCLPGWSGVHCDSVCAEGRWGPNCSLPC 585 

: I : I : I I I i I I I I I : : I : I : I I I I I : I I I 

Db 1447 EKICNCENGGTCDRLTGQCRCLPGFTGMTCNQVCPEGRFGAGCKEKC 14 93 



RESULT 3 



T26972 

hypothetical protein Y47H9C.4 - Caenorhabditis elegans 
C; Species: Caenorhabditis elegans 

C;Date: 15-Oct-1999 #s equence^revision 15-Oct-1999 #text_change 17-Mar-2000 
C;Accession: T26972 
R;Harris, B. 

submitted to the EMBL Data Library, October 1998 
A; Reference number: Z20293 
A; Accession: T2 6972 

A; Status: preliminary; translated from GB/EMBL/DDBJ 

A;Molecule type: DNA 

A; Residues: 1-1111 <WIL> 

A;Cross-references: EMBL : ALO 32 657 ; PIDN : CAA2 17 3 9 . 1 ; GSPDB : GNO 0 0 1 9 ; CESP : Y4 7H9C . 4 

A; Experimental source: clone Y47H9C 

C; Genetics : 

A; Gene : CESP : Y4 7H9C . 4 

A;Map position: 1 

A;Introns: 50/2; 84/2; 150/1; 238/3; 342/3; 797/1; 851/1; 947/2; 1017/1; 1083/1 
C; Superf amily : unassigned ankyrin repeat proteins; ankyrin repeat homology; EGF 
homology 

Query Match 35.7%; Score 1284.5; DB 2; Length 1111; 

Best Local Similarity 34.2%; Pred. No. 9e-64; 

Matches 246; Conservative 77; Mismatches 221; Indels 175; Gaps 20; 



Qy 


21 


GTASPLNLEDPNVCSHWESYSVTVQESYPHPFDQIYYT SCTDILNWFKCTRHR 

III : : 1 1 : 1 : : 1 : : : 1 | : | I : I 

GTTEP QGDHVCT VKTIVDDY — ELKKVIHTWYNDTEQCLNPLTGFQC 


73 


Db 


35 


80 


Qy 


74 


VSYRTAYRHGEKTMYRRK SQCCPGFYESGE-MCVPHCADKCVHGRCIAPNTC 

1 : 1 : 1 1 : 1 : 1 1 1 1 : 1 : : : 1 : 1 1 1 1 : 1 1 1 1 
TVEKRGQKASYQRQLVKKEKYVKQCCDGYYQTKDHFCLPDCNPPCKKGKCIEPGKC 


124 


Db 


81 


136 


Qy 


125 


QCEPGWGGTNCSSACDGDHWGPHCTSRCQCKNGALCNPITGACHCAAGFRGWRCE 

: 1 : 1 1 : 1 1 1 : 1 : 1 II 1 : 1 1 : 1 1 1 1 : 1 1 1 1 : 1 1 : 1 1 1 1 

ECDPGYGGKYCASSCSVGTWGLGCSKSCDCENGANCDPELGTCICTSGFQGERCEKPCPD 


179 


Db 


137 


196 


Qy 


180 


DRCEQGTYGNDCHQRCQCQNGAT 

: : 1 1 : 1 : 1 : 1 1 1 1 1 1 1 1 
NKWGPNCVKSCPCQNGGKCNKEGKCVCSDGWGGEFCLNKCEEGKFGAECKFECNCQNGAT 


202 


Db 


197 


256 


Qy 


203 


CDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPCQNGGVCHHVTGECSCPSGWMGT 

II: 1:11 1 1 1 1 1 1 : 1 1 ! 1 1 : 1 1 1 1 : 1 M 1 Ml 

CDNTNGKCICKSGYHGALCENECSVGFFGSGCTQKCDCLNNQNCDSSSGECKC-IGWTGK 


262 


Db 


257 


315 


Qy 


263 


VCGQPCPEGRFGKNCSQECQC HNGGTCDAATGQCHCSPGYTGERCQD-ECPVGT 

1 1 1 1 II 1 1 1 1 : : 1 1 1 1 1 1 1 1 I I I : I : : 1 

HCDIGCSRGRFGLQCKQNCTCPGLEFSDSNASCDAKTGQCQCESGYKGPKCDERKCDAEQ 


315 


Db 


316 


375 


Qy 


316 


YGVLCAETCQCV--NGGKCYHVSGACLCEAGFAGERCEARLCPEGLYGIKCDKRCPCHLE 

II 1 : : 1 1 1 1 1 1 : 1 1 1 : II 1 : 1 1 1 : II I : 1 : 1 

YGADCSKTCTCVRENTLMCAPNTGFCRCKPGFYGDNCEL-ACSKDSYGPNCEKQAMCDWN 


373 


Db 


376 


434 


Qy 


374 


NTHSCHPMSGECACKPGWSGLYCNETCSPGFYGEACQQICSC-QNGADCDSVTGKCTCAP 

: 1 : 1 : 1 1 1 1 1 1 : 1 1 : 1 1 Ml 1 1 1 1 1 II 1 II 1 
HASECNPETGSCVCKPGRTGKNCSEPCPLDFYGPNCAHQCQCNQRGVGCDGADGKCQCDR 


432 


Db 


435 


494 



Qy 


433 


Db 


495 


QY 


493 


Db 


555 


Qy 


532 


Db 


614 


Qy 


544 


Db 


674 



I I II I : I I I II I I h I III II I : I I I I I : : 



'GCNLTCQCLNGGACNTLDGTCTCAPGW RGEKCEL — PCQD 531 

I I I I : I : I I I : I I I I I I : I I III I I I 



•GTYGLNCAERCD 54 3 

II I : : I : : I I 



I I I I I I I I I I : I I I : I : : I hi 



RESULT 4 
A35844 

Xotch protein - African clawed frog 

C; Species: Xenopus laevis (African clawed frog) 

C;Date: 12-Oct-1990 #sequence_revision 12-Oct-1990 #text_change 02-Aug-2002 
C;Accession: A35844 

R;Coffman, C; Harris, W. ; Kintner, C. 
Science 249, 1438-1441, 1990 

A; Title: Xotch, the Xenopus homolog of Drosophila notch. 
A;Reference number: A35844; MUID : 903 8 52 8 5 ; PMID:2402639 
A;Accession: A35844 

A; Status: preliminary; nucleic acid sequence not shown; not compared with 
conceptual translation 
A; Molecule type: mRNA 
A;Residues: 1-2524 <COF> 

C; Superf amily : notch protein; ankyrin repeat homology; EGF homology 

C; Keywords: transmembrane protein 

F; 14 6- 17 7 /Domain : EGF homology <EGXl> 

F;184-215/Domain: EGF homology <EGF1> 

F;222-254/Dortiain: EGF homology <EGF> 

F;456-487/Domain: EGF homology <EGX2> 

F; 757-788/ Domain : EGF homology <EGF3> 

F; 1025-1056/Domain: EGF homology <EGX3> 

F; 1924-1 9 56/Domain : ankyrin repeat homology <AN1> 

F; 1957-19 8 9/Domain : ankyrin repeat homology <AN2> 

F; 19 9 1-2 023 /Domain : ankyrin repeat homology <AN3> 

F ; 2 02 4-2 056/Domain : ankyrin repeat homology <AN4> 

F;2057-2089/Domain: ankyrin repeat homology <AN5> 

Query Match 20.0%; Score 719; DB 2; Length 2524; 

Best Local Similarity 25.9%; Pred. No. 1.8e-32; 

Matches 225; Conservative 60; Mismatches 221; Indels 364; Gaps 51 

Qy 5 LNSCLSFICL LLCHWIGTASPLNLED PNVCSHW ESYS 41 

: I I I I I I : I : : I I I : I : : I 

Db 604 INECLSKPCLNGGQCTDRENGYICTCPKGTTGVNCETKIDDCASNLCDNGKCIDKIDGYE 663 

Qy 42 VTVQESYPHPFDQI YYT SCTDILNWFKCTRHRVS YRTAYRHGEKTMYRR 90 

I : I I : I I : I I I 
Db 664 CTCEPGYTGKLCNININECDSNPCRNGGTCKDQINGFTCV 7 03 



Qy 91 KSQCCPGFYESGEMC VPHC-ADKCVHGRC IAPNTCQCEPGWGGTNC SSACD 14 0 

III II | | : : | : I I I : I II II hll : : I : 

Db 704 CPDGYHD-HMCLSEVNECNSNPCIHGACHDGVNGYKCDCEAGWSGSNCDINNNECE 758 

Qy 141 GDHWGPHCTSRCQCKNGALCNPITGA— CHCAAGFRGWRCEDRCEQGTYGNDCHQRCQCQ 198 

: I I I I : I I I I I I I I I I : I i : I I 

Db 759 SN PCMNGGTCKDMTGAYICTCKAGFSGPNCQ TNINECSSN- PCL 801 

Qy 199 NGATC-DHVTG-ECRCPPGYTGAFCEDLCPPGKHGPQCEQRC PCQNGGVCHH V 249 

111111:11 I I I I I I : I I Ihllll 

Db 8 02 NHGT C I D DVAG YKCN CML P YT GAI CEAVLAP CAGSPCKNGGRCKESEDFE 851 

Qy 250 TGECSCPSGWMGTVCGQPCPEGRFGKNCSQEC QCHNGGTCDAATG — QCHCSPGYTG 304 

I I I I I I I I : II I I I I I I : I : I I I I I I 

Db 8 52 TFSCECPPGWQGQTC EIDMNECVNRPCRNGATCQNTNGSYKCNCKPGYTG 901 

Qy 3 05 ERCQ DECPVGTYGVLCAETCQCVNGGKCYHVSGA — CLCEAGFAGERCEARL 354 

I : I : I : I Ml I I I I I I I : I I : 

Db 902 RNCEMDIDDC QPNPCHNGGSCSDGINMFFCNCPAGFRGPKCEEDINECAS 951 

Qy 355 CPEGLYGIKCDKRCPCHLE NTHSCHPMSG ECAC 387 

I I III: I I | : I : I I I 

Db 952 NPCKNGANCTDCVNSYTCTCQPGFSGIHCESNTPDCTESSCFNGGTC — IDGINTFTCQC 1009 

Qy 388 KPGWSGLYC NE TCSPGFYGEACQQI CS CQ 416 

I I : : I I I II | | | : | M : I I : 

Db 1010 PPGFTGSYCQHDINECDSKPCLNGGTCQDSYGTYKCTCPQGYTGLNCQNLVRWCDSSPCK 1069 

Qy 417 NGADCDSVTG — KCTCAPGFKGI DCSTP 442 

III : I I I : I : I I 

Db 1070 NGGKCWQTNNFYRCECKSGWTGVYCDVPSVSCEVAAKQQGVDIVHLCRNSGMCVDTGNTH 1129 

Qy 443 CPLGTYGINCSSR CG CKNDAVCSPVDG — SCTCKAGWHGVDCS 483 

I I I I : I I : I I I : I I I I I I : I I I : I I 

Db 1130 FCRCQAGYTGSYCEEQVDECSPNPCQNGATCTDYLGGYSCECVAGYHGVNCSEEINECLS 118 9 

Qy 484 IRCPSGTWGFGCNLT C QCLNGGACNTLDG 512 

I I I I I I : I : I I I I I 

Db 1190 HPCQNGGTCIDLINTYKCSCPRGTQGVHCEINVDDCTPFYDSFTLEPKCFNNGKCIDRVG 1249 

Qy 513 — TCTCAPGWRGEKCE LPCQD-GTYGLNCAE RCDC SH 546 

I I I I : I I : I I II II II: I I : I I 

Db 1250 GYNCICPPGFVGERCEGDVNECLSNPCDSRGTQ— NCIQLWDYRCECRQGFTGRRCESV 1307 

Qy 547 ADGC HPTTGH-CRCLPGWSGVHCD 569 

III : | | : | | | : | | : 

Db 1308 VDGCKGMPCRNGGTCAVASNTERGFICKCPPGFDGATCEYDSRTCSNLRCQNGGTCISVL 1367 

Qy 570 SVCAEGRWGPNC SLPCY 586 

I I : I I I I I I I I 

Db 1368 TSSKCVCSEGYTGATCQYPVISPCASHPCY 1397 

RESULT 5 
S42612 

transmembrane protein precursor - zebra fish 



C; Species: Brachydanio rerio (zebra fish) 

C;Date: 20-Feb-1995 #sequence_revision 20-Feb-1995 #text_change 02-Aug-2002 
C; Access ion : S42 612 

R;Bierkamp, C. ; Campos -Ortega , J. A. 
Mech. Dev. 43, 87-100, 1993 

A/Title: A zebrafish homologue of the Drosophila neurogenic gene Notch and its 

pattern of transcription during early embryogenesis . 

A;Reference number: S42612; MUID : 94128 602 ; PMID:8297791 

A; Access ion: S42 612 

A; Status: preliminary 

A;Molecule type: mRNA 

A; Residues: 1-2437 <BIE> 

A;Cross-references : EMBL:X69088; NID:g433866; P I DN : CAA4 8 8 31.1; PID:g433867 

C; Superf amily : notch protein; ankyrin repeat homology; EGF homology 

F; 7 55-7 8 6/ Domain : EGF homology <EGF1> 

F; 1023- 1054/Domain : EGF homology <EGF> 

F; 118 5- 12 16/ Domain : EGF homology <EGF2> 

F; 1 9 15- 194 7 /Domain : ankyrin repeat homology <AN1> 

F; 1948-1980/Domain: ankyrin repeat homology <AN2> 

F; 1982-2 014/Doma in : ankyrin repeat homology <AN3> 

F; 2 015-2 04 7 /Domain : ankyrin repeat homology <AN4> 

F; 2048-20 8 0/ Domain : ankyrin repeat homology <AN5> 

Query Match 19.9%; Score 717; DB 2; Length 2437; 

Best Local Similarity 25.7%; Pred. No. 2.2e-32; 

Matches 221; Conservative 60; Mismatches 208; Indels 370; Gaps 49; 



Qy 


4 


SLNSCLS FI CLLLCHWI GTAS PLNLEDPNVCSHWES YSVTVQES Y 

: : i 1 1 1 : 1 1 1 : 1 : : 1 1 

NINECLSQPCRNGGTCQDRENAYICTCPKGTTGVNCEINIDD CKR 


48 


Db 


601 


645 


Qy 


49 


PHPFDQI YYTSCTDILNWFKCTRHRVSYRTAYRHGEKTMYRRKSQCCPGFYESGEMC 

1 1 | | | : | : : 1 1 1 1 : 1 1 1 1 


105 


Db 


646 


-KPCD YGKCIDKINGYECV CEPGY — SGSMCNIN 


676 


Qy 


106 


VPHCA DKCVHGRC IAPNT 


123 


Db 


677 


: || : 1 : II 1 1 

IDDCALNPCHNGGTCIDGVNSFTCLCPDGFRDATCLSQHNECSSNPCIHGSCLDQINSYR 


736 


Qy 


124 


CQCEPGWGGTNCSSACDGDHWGPHCTSRCQCKNGALCNPITGA — CHCAAGFRGWRCEDR 

I 1 1 1 1 1 1 1 : II 1 1 1 1 : 1 1 1 1 1 1 1 1 : 
CVCEAGWMGRNCDININ ECLSN-PCVNGGTCKDMTSGYLCTCRAGFSGPNCQMN 


181 


Db 


737 


789 


Qy 


182 


CEQGTYGNDCHQRCQCQNGATC-DHVTG-ECRCPPGYTGAFCEDLCPPGKHGPQCEQRCP 
1 : 1 1 i : 1 1 1 1 : 1 1 III 1 1 : : 1 1 II 


239 


Db 


790 


I N E CAS N-PCLNQGSCID D VAG F KCN CML P YT GE VC EN VLAP CSPR-P 


835 


Qy 


240 


CQNGGVCHH VTGECSCPSGWMGTVCGQPCPEGRFGKNCSQEC QCHNGGTCDAA 

1 : 1 1 1 1 1 : 1 : 1 1 : 1 1 1 1 II 1 1 1 1 1 : 
CKNGGVCRESEDFQSFSCNCPAGWQGQTCEVDI NECVRNPCTNGGVCENL 


292 


Db 


836 


885 


Qy 


293 


TG — QCHCSPGYTGERCQ DECPVGTYGVLCAETCQCVNGGKCY-HVSG-ACLCEAGF 

1 II 1 : 1 1 : 1 1 1 : 1 : 1 I 1 1 1 1 1 hi 1 : 1 1 1 1 


345 


Db 


886 


RGGFQCRCNPGFTGALCENDIDDC EPNPCSNGGVCQDRVNGFVCVCLAGF 


935 


Qy 


346 


AGERCEARL CPEGLYGIKCDKRCPCHLENTHSCHP- 


380 



I I I I : 



Db 



93 6 RGERCAEDIDECVSAPCRNGGNCTDCVNSYTCSCPAGFSGINCEINTPDCTES — SCFNG 993 



Qy 


381 


MSGECACKPGWSGLYC NE TCSPGFYGEA 

i i I I I • • I I I ll 111*1 
1 1 1 II 1 II II 1 1 1 ■ 1 

GTCVDGISSFSCVCLPGFTGNYCQHDVNECDSRPCQNGGSCQDGYGTYKCTCPHGYTGLN 


408 


Db 


994 


1053 


Qy 


409 


CQQI CS CQNGADC--DSVTGKCTCAPGFKGIDCSTP 

II: 1 1 : 1 1 1 : 1 II h II 1 1 

CQSLVRWCDSSPCKNGGSCWQQGASFTCQCASGWTGIYCDVPSVSCEVAARQQGVSVAVL 


442 


Db 


1054 


1113 


Qy 


443 


CPLGTYGINCSSRCG CKNDAVCSPVDG — SCTCKAGW 


477 


Db 


1114 


1 1 1 1 : 1 : 1 1 1 : III: 

CRHAGQCVDAGNTHLCRCQAGYTGSYCQEQVDECQPNPCQNGATCTDYLGGYSCECVPGY 


1173 


Qy 


478 


HGVDCS IRCPSGTWGFGCNL TC 

II : : 1 1 1 1 1 1 1 1 : 1 

HGMNCSKEINECLSQPCQNGGTCIDLVNTYKCSCPRGTQGVHCEIDIDDCSPSVDPLTGE 


499 


Db 


1174 


1233 


Qy 


500 


-QCLNGGACNTLDG — TCTCAPGWRGEKCE LPCQ-DGTYGLNCAE RC 

• i i i 1 1 1 ll 1 • 1 1 ■ 1 1 II 1 • 1 1 1 II 

• 1 1 1 1 1 1 II 1 . 1 1 ■ 1 1 II 1 • 1 II- II 

PRCFNGGRCVDRVGGYGCVC PAGFVGERCEGDVNECLSDPCDPSGSY--NCVQLINDFRC 


542 


Db 


1234 


1291 


Qy 


543 


DCS HA DGCHPT TGH CRCLPGWSGVHCD 

: 1 : 1 1 1 II 1 : 1 1 1 : 1 1 1 : 
ECRTGYTGKRCETVFNGCKDTPCKNGGTCAVASNTKHGYICKCQPGYSGSSCEYDSQSCG 


569 


Db 


1292 


1351 


Qy 


570 


SVCAEGRWGPNC 581 




Db 


1352 


: 1 1 II 

SLRCRNGATCVSGHLSPRC 137 0 





RESULT 6 
S78549 

notch3 protein - human 

C; Species: Homo sapiens (man) 

C;Date: 24-Jul-1998 #sequence_revis ion 24-Jul-1998 #text_change 08-Sep-2002 

C;Accession: S78549; S71825 

R;Joutel, A.; Tournier-Las serve , E. 

submitted to the EMBL Data Library, April 1997 

A/Reference number: S78549 

A; Accession: S7 8 54 9 

A; Molecule type: mRNA 

A; Residues: 1-2321 <JOUl> 

A; Cross-references : EMBL:U97669; NID : g2 6685 91 ; PIDN : AAB9137 1 . 1 ; PiD:g2668592 
R;Joutel, A.; Corpechot, C; Ducros, A.; Vahedi, K. ; Chabriat, H.; Mouton, P.; 
Alamowitch, S.; Domenga, V.; Cecillion, M. ; Marechal, E. ; Maciazek, J.; 
Vayssiere, C. ; Cruaud, C; Cabanis, E.A.; Ruchoux, M.M. ; Weissenbach, J.; Bach, 
J.F.; Bousser, M.G.; Tournier-Las serve , E. 
Nature 383, 707-710, 1996 

A;Title: Notch3 mutations in CADASIL, a hereditary adult-onset condition causing 
stroke and dementia . 

A;Reference number: S71825; MU1D : 97 032728 ; PMID:8878478 
A; Accession: S71825 

A; Status: nucleic acid sequence not shown 
A;Molecule type: DNA 

A; Residues : 67-113; 138-194 ; 2 68-333, 'G' , 335-34 6; 536-613; 716-7 65; 12 4 0-1279; 1815- 
1888 <JOU2> 



A; Cross-references : EMBL:U97 669 

C ; Genetics : 

A; Gene: notch3 

A; Map position: 19pl3.1 

C; Function: 

A; Description : may be involved in pathogenesis of CADASIL, causing a type of 
stroke and dementia 

C;Superfamily: notch protein; ankyrin repeat homology; EGF homology 

C; Keywords: tandem repeat; transmembrane protein 

F;123-155/Domain: EGF homology <EGX1> 

F;162-194/Domain: EGF homology <EGFl> 

F;240-271/Domain: EGF homology <EGX2> 

F;318-349/Domain: EGF homology <EGF> 

F;473-504/Domain: EGF homology <EGX3> 

F;853-884/Domain: EGF homology <EGF3> 

F; 928-959/Domain: EGF homology <EGX4> 

F; 1838-1870/Domain: ankyrin repeat homology <AN1> 

F; 1871-19 03/Domain: ankyrin repeat homology <AN2> 

F ; 1 9 05- 1 937 /Domain : ankyrin repeat homology <AN3> 

F; 1938-197 0/Doma in : ankyrin repeat homology <AN4> 

F; 1971-2003/Domain: ankyrin repeat homology <AN5> 



Query Match 19.4%; Score 697; DB 2; Length 2321; 

Best Local Similarity 25.2%; Pred. No. 2.7e-31; 

Matches 226; Conservative 61; Mismatches 249; Indels 360; Gaps 



Qy 


5 


LNSCLS FICLLLCHWIGTASPLNLED PNVC-SHWESY 

: | | | | | | : : : | | :::: : II 
VNECLSGPCRNQATCLDRIGQFTCICMAGFTGTYCEVDIDECQSSPCVNGGVCKDRVNGF 


40 


Db 


432 


491 


Qy 


41 


SVTVQESYPHPFDQIYYTSC — TDI LNWFKCTRHRVSYRTAYRHGEKTMYRRKSQCCPGF 
I T : 1 : 1 1 1 1 1 1 : : 1 M 


98 


Db 


492 


SCTCPSGFSGSTCQLDVDECASTPCRNGAKCVDQPDGY ECRCAEGF 


537 


Qy 


99 


YESGEMC VPHCA-DKCVHGRC IAPNTCQCEPGWGGTNCS SACDGDHWGPHCTSR 

1 : 1 1 1 : 1 1 1 1 1 1 II : 1 1 1 1 : 1 1 1 1 1 1 1 : 

--EGTLCDRNVDDCSPDPCHHGRCVDGIASFSCACAPGYTGTRCESQVD ECRSQ 


151 


Db 


538 


589 


Qy 


152 


CQCKNGALCNPITG — ACHCAAGFRGWRCE DRCEQG — TYG--NDCHQR--CQCQNG 

I : : I 1 : 1 1 : 1 1 1 1 II hi 1 III!! 
-PCRHGGKCLDLVDKYLCRCPSGTTGVNCEVNIDDCASNPCTFGVCRDGINRYDCVCQPG 


200 


Db 


590 


648 


Qy 


201 


AT CDHVTGECRCPPGYTGAFCED LCPPGKHGPQC EQRC PCQNGG 

I I : 1 I 1 III 1 1 1 1 1 M 1 H = l 
FTGPLCNVEINECASSPCGEGGSCVDGENGFRCLCPPGSLPPLCLPPSHPCAHEPCSH-G 


244 


Db 


649 


707 


Qy 


245 


: 1 : 1 1 1 1 1 1 1 1 lllh 
ICYDAPGGFRCVCEPGWSGPRCSQSLARDACESQPCRAGGTCSSDGMGFHCTCPPGVQGR 


275 


Db 


708 


767 


Qy 


276 


NCS — QEC QCHNGGTCDAATGQ CHCSPGYTGERCQ DEC PVGTYGVLC 

| 1 1 : 1 1 1 : : 1 1 1 II 1 : 1 III Ml 1 1 M : 1 

QCELLSPCTPNPCEHGGRCESAPGQLPVCSCPQGWQGPRCQQDVDECAGPAPCGPHGI-C 


320 


Db 


768 


826 


Qy 


321 


AE TCQ CVNGGKCYHVSG— ACLCEAGFAGERCEA 

1 I 1:1111 I : 1 1 1 1 1 1 1 1 1 
TNLAGSFSCTCHGGYTGPSCDQDINDCDPNPCLNGGSCQDGVGSFSCSCLPGFAGPRC-A 


352 


Db 


827 


885 



Qy 


OJO 


nu 
1JD 


0 0 O 


Qy 


o o o 


DD 


Q Q Q 


Qy 


/I '1 '1 


DD 


QQQ 

Z? ZJ 


Qy 


A A O 

4 43 


DD 


i u o y 


Qy 


4 82 


Db 


'1 '1 '1 Q 

1 1 1 y 


Qy 


504 


Db 


1179 


Qy 


556 


Db 


1231 



LCPEGLYGIKCDKRCPCHLENTHSCHPM 3 81 

I I I I I :: I M 
1CLSNPCGPGTCTDHVAS FTCTCPPGYGGFHCEQDLP DCSPSSCFNGGT 93 8 

-SGECACKPGWSGLYCNE TCSPGFYGEACQ 410 

I I |:||::| :| M I I H 

r MSFSCLCRPGYTGAHCQHEADPCLSRPCLHGGVCSAAHPGFRCTCLESFTGPQCQ 99 8 

CS CQNGADCDSVTGKCTCAPGFKGI DC STP 44 2 

II I I I I I I I I I : I I II 



-CPLGTYGINCSSRCG CKNDAVCSPVDGS--CTCKAGWHGVD 481 

|:: I I | | |::| : 



-IRCPSGTWGFGCNLT C QCLN 503 

i I I I I I : I : I I : 



iGACNTLDG— TCTCAPGWRGEKCEL PCQDGTYGLNCAERCDCSHADGCHPTTG 555 

I I I I I I I I I : I : I I I : I : I I 

GTCVDLVGGFRCTCPPGYTGLRCEADINECRSGA CHAAHTRDCLQDPGGGF 12 30 

CRCLPGWSGVHCDSV CAEGRWGPNC 581 

II I : I I I : I 11:1111 



RESULT 7 
A46019 

notch-1 protein - mouse 

N; Alternate names: motch protein 

C; Species: Mus musculus (house mouse) 

C;Date: 22-Sep-1993 #sequence_revision 18-Nov-1994 #text_change 07-Mar-2003 

C;Accession: A46019; S25144; C49175; B46438; A46438; PH1569; S32109 

R;del Amo, F.F.; Gendron-Maguire , M. ; Swiatek, P.J.; Jenkins, N.A.; Copeland, 

N.G.; Gridley, T. 

Genomics 15, 259-264, 1993 

A;Title: Cloning, analysis, and chromosomal localization of Notch-1, a mouse 
homolog of Drosophila Notch. 

A;Reference number: A46019; MUID : 93194170; PMID:8449489 
A; Accession : A4 6019 

A; Status: not compared with conceptual translation 
A;Molecule type: nucleic acid 
A; Residues: 1-2531 <DEL> 

A;Cross-references: GB:Z11886; GB:S47228 ; NID:g288502 ; P I DN : CAA7 7 941.1; 
PID:g288503 

A;Note: sequence extracted from NCBI backbone (NCBIP : 127318) 

R; Franco del Amo, F. ; Smith, D.E.; Swiatek, P.J.; Gendron-Maguire, M. ; 

Greenspan, R.J.; McMahon, A. P.; Gridley, T. 

submitted to the EMBL Data Library, April 1992 

A; Description : Expression pattern of Motch, a mouse homolog of Drosophila Notch, 
suggests an important role in early pos timplantation mouse development. 
A;Reference number: S25144 
A; Accession : S2514 4 



A; Molecule type: mRNA 

A;Residues: 1551-2108 , ' Q 2110-2 114 , 'ALP 2118-2170 <FRA> 

A; Cross-references : EMBL:Z11886 

R; Lardelli, M. ; Lendahl, U . 

Exp. Cell Res. 204, 364-372, 1993 

A;Title: Motch A and Motch B — two mouse Notch homologues coexpressed in a wi 
variety of tissues. 

A;Reference number: A49175; MUID : 93 17 8 563 ; PMID:8440332 
A; Access ion: C4 9175 

A; Status: preliminary; nucleic acid sequence not shown 

A; Molecule type: mRNA 

A; Residues: 1161-1547 <LAR> 

A;Cross-references: EMBL:X68278; NID:g287987; P I DN : CAA4 8 339.1; PID:g287988 
A; Experimental source: embryo 

A;Note: sequence extracted from NCBI backbone (NCBIP : 126159 ) 

R;Kopan, R. ; Weintraub, H. 

J. Cell Biol. 121, 631-641, 1993 

A; Title: Mouse notch: expression in hair follicles correlates with cell fate 
determination . 

A;Reference number: A46438; MUID : 93252 998 ; PMID:8486742 
A; Accession: B4 64 3 8 
A; Status : preliminary 
A;Molecule type: nucleic acid 

A;Residues: 1865-1932, ' RR' , 1935-1937, 1 L 1 , 1938-1 967 , ' I 1 , 1969-2044 , ' IE ' , 2047- 
2052, 'S', 2054-2 056, ' SIRRE ' ,2 062-2 07 5 <KOP> 
A; Experimental source: embryo 

A;Note: sequence extracted from NCBI backbone ( NCBIN : 1 3124 6 , NCBIP : 13 1247 ) 

C;Comment: This protein has many EGF repeats and lin- 12 [ #1172 ] /Notch repeats 

C; Comment: This protein is one of the neurogenic proteins controlling the 

decision between ectodermal and neural fate for cells in the early embryo. 

C; Genetics : 

A; Gene: notch- 1 

A; Map position: 2 

A; Note: proximal region of chromosome 2 

C; Superf amily : notch protein; ankyrin repeat homology; EGF homology 

F ; 10 6-138 /Domain : EGF homology <EGF1> 

F;144-175/Domain: EGF homology <EG01> 

F;222-254/Domain: EGF homology <EGF2> 

F;261-292/Domain: EGF homology <EG02> 

F;339-370/Domain: EGF homology <EG03> 

F;416-449/Domain: EGF homology <EGF3> 

F; 45 6-4 8 7 /Domain : EGF homology <EG04> 

F; 4 94-52 5/Domain : EGF homology <EG05> 

F;532-563/Domain: EGF homology <EG06> 

F; 607- 63 8 /Domain : EGF homology <EG07> 

F; 6 8 2-7 13/Domain : EGF homology <EG08> 

F;757-788/Domain: EGF homology <EG09> 

F;795-826/Domain: EGF homology <EG10> 

F;873-904/Domain: EGF homology <EG11> 

F; 9 11-942 /Domain : EGF homology <EG12> 

F; 94 9-9 8 0/Domain : EGF homology <EG13> 

F;987-1018/Domain: EGF homology <EG14> 

F ; 1025- 105 6/Domain : EGF homology <EG15> 

F; 1063-1094/ Domain : EGF homology <EG16> 

F; 114 9- 118 0/ Domain : EGF homology <EG17> 

F ; 1187-12 18 /Domain : EGF homology <EG18> 

F;1233-1264/Domain: EGF homology <EGF4> 



F;1352-1383/Domain: EGF homology <EG19> 

F;1391-1425/Domain: EGF homology <EGF> 

F; 1917-1948/ Domain : ankyrin repeat homology <AN1> 

F; 1949-1981/ Domain : ankyrin repeat homology <AN2> 

F; 1983-2015/ Domain : ankyrin repeat homology <AN3> 

F;2016-2048/Domain: ankyrin repeat homology <AN4> 

F;2049-2081/Domain: ankyrin repeat homology <AN5> 

Query Match 19.2%; Score 693; DB 2; Length 2531; 

Best Local Similarity 25.3%; Pred. No. 4.7e-31; 

Matches 217; Conservative 71; Mismatches 206; Indels 364; Gaps 51; 

Qy 10 SFICLLLCHWIGTASPLNLEDPNVCSHWESYSVTVQESYPHPFDQI YYTSCTDILNWFKC 69 

I : : I I I I : II : I I : : I I : | | : : : : I 

Db 624 SYLCLCLKGTTGPNCEINLDD CA SNPCDS GTCLDKIDGYEC 664 

Qy 7 0 TRHRVSYRTAYRHGEKTMYRRKSQCCPGFYESGEMC VPHCA 110 

III: : I I I : I I 

Db 665 A CEPGY — TGSMCNVNIDECAGSPCHNGGTCEDGIAG 699 

Qy HI DKCVHGRC IAPNTCQCEPGWGGTNC SSACDG 141 

: I : I I I : I I I I I I I I I : : I : 

Db 700 FTCRCPEGYHDPTCLSEVNECNSNPCIHGACRDGLNGYKCDCAP GWSGTNCD1NNNECES 759 

Qy 142 DHWGPHCTSRCQCKNGALCNPITG— ACHCAAGFRGWRCEDRCEQGTYGNDCHQRCQCQN 199 

: I I I I : I I I I I I I : I I : I II 

Db 7 60 N PCVNGGTCKDMTSGYVCTCREGFSGPNCQ TNINECASN-PCLN 8 02 

Qy 200 GATC-DHVTG-ECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPCQNGGVC HHVTGEC 253 

I | | | I : I II I I I I M : I I 11:1111 : : I 

Db 803 QGTCIDDVAGYKCNCPLPYTGATCEWLAP C-ATS PCKNSGVCKESEDYESFSC 855 

Qy 254 SCPSGWMGTVC GQPCPEGR FGKNCS QECQ 282 

11:1111 Ml I : I I : I : 

Db 856 VCPTGWQGQTCEVDINECVKSPCRHGASCQNTNGSYRCLCQAGYTGRNCESDIDDCRPNP 915 

Qy 283 CHNGGTCDAA--TGQCHCSPGYTGERCQDE CPVGT 315 

I I I I I : I | | | | | : I I : : : MM 

Db 916 CHNGGSCTDGINTAFCDCLPGFQGAFCEEDINECASNPCQNGANCTDCVDSYTCTCPVGF 975 

Qy 316 YGVLCAET CQCVNGGKCYHVSG ACLCEAGFAGERCEARLCPEGLYGI-KC 364 

|:| I I II I I I I I I I I I I : I : : I 

Db 976 NGIHCENNTPDCTESSCFNGGTC — VDGINSFTCLCPPGFTGSYCQ YDVNEC 1025 

Qy 365 DKRCPCHLENTHSCHPMSG--ECACKPGWSGLYCNE TCS PGFYGEACQQ1 412 

I I I I II I : I I I : : I I I : I III 

Db 1026 DSR-PCLHGGT — CQDS YGTYKCTCPQGYTGLNCQNLVRWCDSAPCKNGGRCWQTNTQYH 1082 

Qy 413 CSCQN GADCDSVTGKCTCAPGFKGIDCSTPCPLGTYGIN CSSRCG 457 

II:: | : | | : : | I : I I I : I I : : I : I 

Db 1083 CECRSGWTGVNCDVLSVSCEVAAQKRGIDVTLLCQHGGLCVDEGDKHYCHCQAGYTGSYC 1142 

Qy 458 CKNDAVCSPVDG--SCTCKAGWHGVDCS 483 

I : 1 I I : I I I I I I : I I : I I 
Db 1143 EDEVDECSPNPCQNGATCTDYLGGFSCKCVAGYHGSNCSEEINECLSQPCQNGGTCIDLT 1202 



Qy 



484 



QCLNGGACNTLDG — TCTCAPGWRGE 523 



I i II I I I -III! I 1111 

Db 12 03 NSYKCSCPRGTQGVHCEINVDDCHPPLDPASRSPKCFNNGTCVDQVGGYTCTCPPGFVGE 12 62 

Qy 524 KCE LPCQD-GTYGLNCAER CDC SHADGC 550 

: I i II II II : I hi 1=11 

Db 1263 RCEGDVNECLSNPCDPRGTQ— NCVQRVNDFHCECRAGHTGRRCESVINGCRGKPCKNGG 1320 

Qy 551 HPTTGH-CRCLPGWSGVHCDS VCAEGRWGPNC 5 81 

: I I I I I : I I : : I I i I 

Db 1321 VCAVASNTARGFICRCPAGFEGATCENDARTCGSLRCLNGGTCISGPRSPTCLCLGSFTG 1380 

Qy 582 SLPCY 586 

I I I I 

Db 1381 PECQFPASSPCVGSNPCY 1398 



RESULT 8 
A49175 

Motch B protein - mouse (fragment) 

N;Alternate names: Notch hornolog 

C; Species: Mus musculus (house mouse) 

C;Date: 21-Jan-1994 #sequence_revision 05-Jan-1996 #text_change 08-Sep-2002 

C;Accession: A49175; PH1570; S32113 

R;Lardelli, M. ; Lendahl, U. 

Exp. Cell Res. 204, 364-372, 1993 

A; Title: Motch A and Motch B — two mouse Notch homologues coexpressed in a wide 
variety of tissues. 

A;Reference number: A49175; MUID : 9317 8563 ; PMID:8440332 
A; Access ion : A4917 5 

A; Status: preliminary; nucleic acid sequence not shown 
A;Molecule type: mRNA 
A; Residues: 1-1203 <LAR> 

A;Cross-references : EMBL:X68279; NID:g287989; PIDN: CAA48340 . 1 ; PID:g287990 
A; Experimental source: embryo 

A;Note: sequence extracted from NCBI backbone (NCBIP : 12 6158 ) 

C;Comment: This protein has many EGF repeats and lin-12/Notch repeats. 

C;Comment: This protein is one of the neurogenic proteins controlling the 

decision between ectodermaland neural fate for cells in the early embryo. 

C; Superf amily : notch protein; ankyrin repeat homology; EGF homology 

F; 14 3- 17 4 /Domain : EGF homology <EGX1> 

F; 482-513/Domain: EGF homology <EGF1> 

F; 5 6 0-5 91 /Domain: EGF homology <EGF> 

F; 67 4-7 05/Domain : EGF homology <EGX2> 

F;712-743/Domain: EGF homology <EGF3> 

F; 836-867/Domain: EGF homology <EGX3> 

Query Match 19.1%; Score 687; DB 2; Length 1203; 

Best Local Similarity 24.8%; Pred. No. 6.3e-31; 

Matches 221; Conservative 78; Mismatches 245; Indels 348; Gaps 56; 

Qy 3 ISLNSCLSFICL LLCH WIGTASPLNLE — DPNVCSHW ES 39 

I :: I I II : I : I I : : I I : I I :l 

Db 214 IDIDDCSSTPCLNGAKCIDHPNGYECQCATGFTGILCDENIDNCDPDPCHHGQCQDGIDS 273 

Qy 40 YSVTVQESYPHPF— DQI — YYTS CTDILNWFKCT RHRVSY 7 6 

I : I III hi | | : : | : : | : : : 

Db 274 YTCICNPGYMGAICSDQIDECYSSPCLNDGRCIDLWGYQCNCQPGTSGLNCEINFDDCA 333 



Qy 77 RTAYRHG— EKTMYRRKSQCCPGF YESGEMCV 106 

11:11111 II- 

Db 334 SNPCMHGVCVDGINRYSCVCSPGFTGQRCNIDIDECASNPCRKGATCINDVNGFRCICPE 393 

Qy 107 PHC ADKCVHGRC IAPNTCQCEPGWGGTNCSSACDGDHWGPHCTSR 151 

II : : I : I I I = : I h I : I I 

Db 394 GPHHPSCYSQVNECLSNPCIHGNCTGGLSGYKCLCDAGWVGVNCE — VDKN ECLSN 447 

Qy 152 CQCKNGALCNPITGA— CHCAAGFRGWRCE-— DRC EQGTYGNDCH-QRCQCQ- 198 

|:||||: i | | | : | : | : II I I I : I II 

Db 44 8 -PCQNGGTCNNLVNGYRCTCKKGFKGYNCQVNIDECASNPCLNQGTCFDDVSGYTCHCML 506 

Qy 199 "NGATCDHVTGECRCPPGYTGAFCED LCPPGKHGPQCE QRC PCQ 241 

III I I II:: : I I I I 

Db 507 PYTGKNCQTVLAPCSPNPCENAAVCKEAPNFESFSCLCAPGWQGKRCTVDVDECISKPCM 566 

Qy 242 NGGVCHHVTGE--CSCPSGWMGTVCGQPCPEGRFGKNCSQEC QCHNGGTC--DAATG 294 

i I I I I : I I I I I : I I : • I I I I I : I I 

Db 567 NNGVCHNTQGSYVCECPPGFSGMDCEEDI NDCLANPCQNGGSCVDHWTF 616 

Qy 295 QCHCSPGYTGERCQDE CPVGTYGVLC AETC 324 

I I i I : I : : I I : I I I : I I I : I 

Db 617 SCQCHPGFIGDKCQTDMNECLSEPCKNGGTCSDYVNSYTCTCPAGFHGVHCENNIDECTE 67 6 

Qy 325 -QCVNGGKCYHVSG ACLCEAGFAG ERCEAR LC 355 

I I I I I I ! : I I I I I I I : : I 

Db 677 SSCFNGGTC--VDGINS FSCLCPVGFTGPFCLHDINECSSNPCLNAGTCVDGLGTYRCIC 734 

Qy 356 PEGLYGIKCD KRCPCHLENTHSCHPMSGECACKPGWSGLYCNE 398 

I I I I I I I : I I I I I I I I I : 

Db 735 PLGYTGKNCQTLVNLCSRSPCKNKGTCVQEKARPHCLCPPGWDGAYCDVLNVSCKAAALQ 794 

Qy 399 TCSPGFYGEACQQ ICS CQNGADCDSVTG-- 42 6 

I I : I I : : I : I I : I I I : I 

Db 795 KGVPVEHLCQHSGICINAGNTHHCQCPLGYTGSYCEEQLDECASNPCQHGATCNDFIGGY 854 

Q y 427 KCTCAPGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVCS PVDGSCTCKAGWHGVDCSIRC 486 

: I I I I : : I : : I I : : I : I I : I I 

Db 855 RCECVPGYQGVNCE YEVDECQNQPCQNGGTCIDLVNHFKCS C 896 



Qy 



487 PSGTWGFGC— NL-TC QCLNGGAC-NTLDG-TCTCAPGWRGEKCE LPC 529 

MM I I : I I I I I I I : M I I I I I M I : I I M 

Db 897 PPGTRGLLCEENIDECAGGPHCLNGGQCVDRIGGYTCRCLPGFAGERCEGDINECLSNPC 956 



Qy 530 -QDGTYGLNCAE RCDCSHA DGC HPTTGHC 557 

: I : I : M III I I II 

Db 957 SSEGS — LDCVQLKNNYNCICRSAFTGRHCETFLDVCPQKPCLNGGTCAVASNMPDGFIC 1014 

Qy 558 RCLPGWSGVHCDSVCAEGRW GPNC SLPC 585 

I I I I : I I I I I : : Ml I I I 

Db 1015 RCPPGFSGARCQSSCGQVKCRRGEQCIHTDSGPRCFCLNPKDCESGCASNPC 1066 



RESULT 9 
S45306 

notch 3 protein - mouse 



C; Species: Mus rnusculus (house mouse) 

C;Date: 20-Feb-1995 #sequence_revision 20-Feb-1995 #text_change 02-Aug-2002 
C; Accession: S45306 

R;Lardelli, M. ; Dahl strand, J.; Lendahl, U. 
Mech. Dev. 46, 123-136, 1994 

A; Title: The novel Notch homologue mouse Notch 3 lacks specific epidermal growth 

factor-repeats and is expressed in proliferating neuroepithelium. 

A;Reference number: S45306; MUID : 95001556; PMID:7918097 

A; Accession : S4530 6 

A; Status: preliminary 

A; Molecule type: mRNA 

A; Residues: 1-2318 <LAR> 

A;Cross-references: EMBL:X74760; NID:g483580; PIDN : CAA52 7 7 6 . 1 ; PID:g483581 

C; Super family : notch protein; ankyrin repeat homology; EGF homology 

F; 163- 195 /Domain : EGF homology <EGFl> 

F; 4 7 4-5 05 /Domain : EGF homology <EGF> 

F; 8 5 4-8 8 5 /Domain : EGF homology <EGF2> 

F; 1839-1871/Domain: ankyrin repeat homology <AN1> 

F; 18 7 2- 19 04 /Domain : ankyrin repeat homology <AN2> 

F; 19 0 6- 193 8 /Domain : ankyrin repeat homology <AN3> 

F; 193 9-197 1/ Domain : ankyrin repeat homology <AN4> 

F ; 1972-2 0 04 /Domain : ankyrin repeat homology <AN5> 

Query Match 19.0%; Score 685.5; DB 2; Length 2318; 

Best Local Similarity 24.4%; Pred. No. 1.2e-30; 

Matches 216; Conservative 59; Mismatches 195; Indels 417; Gaps 48; 

IPGFYESGEMCVPHCADKCVHGRCIAPNT CQCEPGWGGTNCSSACDGDHW 144 

Ml I : I : I I II: I I I I I I I I : I 

,PGF — EGQNCEVN-VDDCPGHRCLNGGTCVDGVNTYNCQCPPEWTGQFCTEDVD 2 77 

145 GPHCTSRCQ CKNGALCNPITG--ACHCAAG FRGWRCE 179 

II I I I I : I : I I I III 



Qy 


94 


Db 


225 


Qy 


145 


Db 


278 


Qy 


180 


Db 


332 


Qy 


224 


Db 


392 


Qy 


236 


Db 


452 


Qy 


265 


Db 


512 


Qy 


299 


Db 


572 


Qy 


326 



II I I III I : I 11111:11 I : 



-LCPPGKHGPQCE 2 35 

I I 11:11 



-QRCPCQNGGVC-HHVTG-ECSCPSGWMGTVC 2 64 

I II Mill II I : I I I I : I : : I 



II Mill: : I I I : I I I : I I 

lSTPCRNGAKCVDQPDGYECRCAEGFEGTLCERNVDDCSPDPCHHGRCVDGIASFSCAC 571 

'GYTGERCQDE CPVGTYGVLC AETCQ 325 

: : I I I I I I I : I 



- - C VN G G K C YH VS G AC L C E AG F AGE RC EARL CPEGLYGIKCDKRCP 369 

I : I I I : I : I 1 I I : I : I I I II 



D b 632 FGVCRDGINRYD CVCQPGFTGPLCNVEINECASSPCGEGGSCVDGENGFHC--LCP 685 

Qy 370 CHLENTHS-CHPMSG--ECACKPGWSGLYCNE 398 

I : I II I Ihlllll I : : 

Db 686 PGSLPPLCLPANHPCAHKPCSHGVCHDAPGGFRCVCEPGWSGPRCSQS LAP DACES QPCQ 745 



Qy 



399 TCS PGFYGEACQQI — CS CQNGADCDSVTGK CTCAPGFKG 436 

| | : | M | I : : I : I : : I I : I : I : I I I : : I 
Db 746 AGGTCTSDGIGFRCTCAPGFQGHQCEVLSPCTPSLCEHGGHCESDPDRLTVCSCPPGWQG 8 05 



Qy 



437 IDCSTPCPLGTYGINCSS RCGCKNDAV CSP 466 

: I : | | : | | : : III II 

Db 806 PRCQQDVDECAGASPCGPHG-TCTNLPGNFRCICHRGYTGPFCDQDIDDCDPNPCLHGGS 864 



Qy 



467 -VDG SCTCKAGWHGVDC SIRCPSGTWGFGCNL 497 

II ||:| |: I I : ii I II I : 

Db 8 65 CQDGVGSFSCSCLDGFAGPRCARDVDECLSSPCGPGTCTDHVASFTCACPPGYGGFHCEI 92 4 



Qy 4 98 TCQCLNGGACNTLDG TCTCAPGWRGEKC 52 5 

I I I I I : I I : I i I I : I I 

Db 925 DLPDCSPSSCFNGGTC— VDGVSSFSCLCRPGYTGTHCQYEADPCFSRPCLHGGICNPTH 982 



Qy 



526 ELPCQDGTYGLNCAERCD CSHADGCHPTTGHCRCLPGWSGVHCD 569 

| | : : I I I I I : I I : I I I I I I I I I 

Db 983 PGFECTCREGFTGSQCQNPVDWCSQAPCQNGGRCVQTGAYCICPPGWSGRLCDIQSLPCT 1042 



Qy 570 SVCAEGRWGPNCSL PC 585 

I I I I I I : I M 
Db 1043 EAAAQMGVRLEQLCQEGGKCIDKGRSHYCVCPEGRTGSHCEHEVDPC 108 9 



RESULT 10 
A49128 

cell-fate determining gene Notch2 protein - rat 
C; Species: Rattus norvegicus (Norway rat) 

C;Date: 21-Jan-1994 #sequence_revision 18-Nov-1994 #text_change 02-Aug-2002 
C;Accession: A49128 

R;Weinmaster , G. ; Roberts, V.J.; Lemke, G. 

Development 116, 931-941, 1992 
= A;Title: Notch2 : a second mammalian Notch gene. 

1 A; Reference number: A49128; MUID : 932 02 015 ; PMID: 1295745 

A; Accession : A4 912 8 

A; Status: preliminary; not compared with conceptual translation 

A; Molecule type: mRNA 

A; Residues : 1-2471 <WEI> 

A; Experimental source: Schwann cell 

A;Note: sequence extracted from NCBI backbone (NCBIP : 127 8 11 ) 

C;Superfamily: notch protein; ankyrin repeat homology; EGF homology 

F;264-295/Domain: EGF homology <EGX1> 

F; 7 9 9-8 3 0/Domain : EGF homology <EGF1> 

F ; 877-9 08/Domain : EGF homology <EGX2> 

F; 1029-1060/ Doma i n : EGF homology <EGF> 

F; 10 67-109 8 /Domain : EGF homology <EGX3> 

F; 1153- 118 4 /Domain : EGF homology <EGF3> 

F; 119 1-122 2 /Domain : EGF homology <EGX4> 

F; 1876-1908 /Domain : ankyrin repeat homology <AN1> 

F; 19 09-1941/ Domain : ankyrin repeat homology <AN2> 



1 



F; 1943-1975/Domain : ankyrin repeat homology <AJW3> 
F; 197 6-2 00 8 /Domain : ankyrin repeat homology <AN4> 
F;2009-2041/Domain: ankyrin repeat homology <AN5> 

Query Match 19.0%; Score 685.5; DB 2; Length 2471; 

Best Local Similarity 24.8%; Precl. No. 1.2e-30; 

Matches 218; Conservative 83; Mismatches 240; Indels 339; Gaps 55; 

Qy 3 ISLNSCLSFICL LLCH WI GTASPLNLE--DPNVCSHW ES 39 

I : : I I I I : I : I I I : : I I : I I : I 

Db 531 IDIDDCSSTPCLNGAKCIDHPNGYECQCATGFTGTLCDENIDNCDPDPCHHGQCQDGIDS 590 

Qy 4 0 YSVTVQESYPHPF — DQI — YYTS CTDILNWFKCT RHRVSY 7 6 

I : I III I : I | | : : | : : | : : : 

Db 591 YTCICNPGYMGAICSDQIDECYSSPCLNDGRCIDLVNGYQCNCQPGTSGLNCEINFDDCA 650 

Qy 77 RTAYRHGE--KTMYRRKSQCCPGFYESGEMC VPHCADK 112 

II : I I II I : I : I : I I 

Db 651 SNPCLHGACVDG1NRYSCVCSPGF — TGQRCNIDIDECASNPCRKDATCINDVNGFRCMC 708 

Qy 113 - CVHGRC IAPNTCQCEPGWGGTNCSSACDGDHWGPHCT 14 9 

I : I I I : : I I : I I I i I I : I 
Db 709 PEGPHHPSCYSQVNECLSSPCIHGNCTGGLSGYKCLCDAGWVGINCE— VDKN ECL 762 

Qy 150 SRCQCKNGALCNPITGA — CHCAAGFRGWRCE DRC EQGTYGNDCH-QRCQC 197 

I I : I I I I : I I I I : I : I : II I M : I II 

Db 7 63 SN-PCQNGGTCNNLVNGYRCTCKKGFKGYNCQVNIDECASNPCLNQGTCLDDVSGYTCHC 821 

Qy 198 Q NGATCDHVTGECRCPPGYTGAFCED LCPPGKHGPQCE QRC P 239 

I I I I I II:: I I I I I : I II 

Db 822 MLPYTGKNCQTVLAPCSPNPCEN7VAVCKEAPNFESFTCLCAPGWQGQRCTVDVDECVSKP 881 

Qy 240 CQNGGVCHHVTGE--CSCPSGWMGTVCGQPCPEGRFGKNCSQEC QCHNGGTC--DAA 2 92 

I I I : I I : I I I I I : I I : : I I I I I : I 

Db 8 82 CMNNGICHNTQGSYMCECPPGFSGMDCEEDI NDCLANPCQNGGSCVDKW 931 

Qy 293 TGQCHCSPGYTGERCQDE CPVGTYGVLC AETC 324 

I I I I I : I : : II : I I I : I I I : I 

Db 932 TFSCLCLPGFVGDKCQTDMNECLSEPCKNGGTCSDYVNSYTCTCPAGFHGVHCENNIDEC 991 

Qy 32 5 QCVNGGKCYHVSG ACLCEAGFAGERC EARLCPEGLYGI KC 3 64 

I I I I I I I : I I I I I I I : I : I I : I 

Db 992 TESSCFNGGTC — VDGINSFSCLCPVGFTGPFCLHDINECSSNPCLNSGTCVDGLGTYRC 1049 

Qy 365 DKRC PCHLENTHSCHPMSGECACKPGWSGLYCNE 398 

II I I : I : I I I I I I I I : 

Db 1050 TCPLGYTGKNCQTLVNLCSPSPCKNKGTCAQEKARPRCLCPPGWDGAYCDVLNVSCKAAA 1109 

Qy 399 TCSPGFYGEACQQ ICS CQNGADCDSVTG 426 

I I : I I : : I : I I : I I I I 

Db 1110 LQKGVPVEHLCQHSGICINAGNTHHCQCPLGYTGSYCEEQLDECASNPCQHGATCSDFIG 1169 

Qy 427 --KCTCAPGFKGI DCSTPCPLGTYGINCSSRCGCKNDAVCSPVDGSCTCKAGWHGVDCSI 484 

: I I I I : : I : : I I : : I : I I : I 

Db 117 0 GYRCECVPGYQGVNCE YEVDECQNQPCQNGGTCIDLVNHFKCS 1212 



QY 



4 8 5 RCPSGTWGFGC— NL-TC QCLNGGAC-NTLDG-TCTCAPGWRGEKCE 



L 527 



Db 



1213 



II II I I I • I l l l I l i 

-CPPGTRGLLCEENIDDCAGAPHCLNGGQCVDRIGGYSCRCLPGFAGERCEGDINECLSN 12 71 



Qy 



528 



PC-QDGTYGLNCAE RCDCSHA DGCH PTTG 555 



Db 



1272 



II ■ I • ! • I .lit ii > , „™ 

PCSSEGS— LDCIQLKNNYQCVCRSAFTGRHCETFLDVCPQKPCLNGGTCAVASNVPDGF 132 9 



QY 



556 



HCRCLPGWSGVHCDSVCAEGRW GPNCSLP 58 4 



Db 



1330 




RESULT 11 
A40701 

tenascin-X precursor - human 
C; Species: Homo sapiens (man) 

C;Date: 10-Sep-1999 #sequence_revision 10-Sep-1999 #text_change 10-Dec-1999 
C;Accession: A40701; A33725; C42175 

R;Bristow, J.; Tee, M.K.; Gitelman, S.E.; Mellon, S.H.; Miller, W.L. 
J . Cell Biol, 122, 265-278, 1993 

A;Title: Tenascin-X: a novel extracellular matrix protein encoded by the human 
XB gene overlapping P450c2lB. 

A;Reference number: A40701; MUID : 93300909 ; PMID:7686164 

A; Accession : A4 07 01 

A; Status : preliminary 

A;Molecule type: DNA 

A; Residues: 1-3566 <BRI> 

A; Cross-references : EMBL:X71937 

R;Morel, Y. ; Bristow, J.; Gitelman, S.E.; Miller, W.L. 
Proc. Natl. Acad. Sci . U.S.A. 86, 6582-6586, 1989 

A;Title: Transcript encoded on the opposite strand of the human steroid 21- 
hydroxylase/complement component C4 gene locus. 
A/Reference number: A33725; MUID : 8 9367293 ; PMID:2475872 
A;Accession: A33725 
A;Molecule type: mRNA 

A; Residues: 2748-3199, 'V , 3201-3298 , ' E ', 329 9-3314 , ' G », 331 6-35 66 <MOR> 
A;Cross-references: GB:M25813; NID:gl83069; PIDN : AAA35 8 8 4 . 1 ; PID:gl83070 
R;Matsumoto, K. ; Arai, M. ; Ishihara, N.; Ando, A.; Inoko, H. ; Ikemura, T . 
Genomics 12, 485-491, 1992 

A; Title: Cluster of fibronectin type III repeats found in the human major 

histocompatibility complex class III region shows the highest homology with the 

repeats in an extracellular matrix protein, tenascin. 

A;Reference number: A42175; MUI D : 922 17 969 ; PMID:1373119 

A; Access ion : C4217 5 

A;Molecule type: DNA 

A; Residues: 1849-1936 <MAT> 

A; Experimental source: clone 3.9kF3-l 

A; Note: sequence extracted from NCBI backbone (NCBIP : 95694) 
C ; Genetics : 

A;Gene: GDB : TNXA; D6S103E; TNX; XA; XB 
A/Cross-references: GDB:568487; OMIM:600261 
A;Map position: 6p2 1 . 3-6p2 1 . 3 

C;Superfamily: tenascin-X; EGF homology; fibrinogen beta/gamma homology; 
fibronectin type III repeat homology 
C;Keywords: extracellular matrix; glycoprotein 
F ; 4 35-4 61 /Domain : EGF homology <EGF> 

F; 748-828 /Domain : fibronectin type III repeat homology <3Fl> 



F; 82 9- 8 5 6/ Domain : fibronectin type III repeat homology #s 
F; 8 7 3- 953 /Domain : fibronectin type III repeat homology <3 
F; 975-1055/Domain: fibronectin type III repeat homology < 

F; 1078-1158/ Domain : fibronectin type III repeat homology 

F; 11 67 -12 4 7 /Domain : fibronectin type III repeat homology 

F; 12 4 8 - 13 17 /Domain : fibronectin type III repeat homology 

F; 1323-14 03 /Domain : fibronectin type III repeat homology 

F; 1412-1492/ Doma i n : fibronectin type III repeat homology 

F; 15 10- 15 90/ Domain : fibronectin type III repeat homology 

F; 16 18 -167 6/ Domain : fibronectin type III repeat homology 

F; 1678-1749/Domain: fibronectin type III repeat homology 

F; 1751-1831/Domain: fibronectin type III repeat homology 

F; 184 9-192 9 /Domain : fibronectin type III repeat homology 

F; 19 55-2 035/Domain : fibronectin type III repeat homology 

F;2061-2141/Domain: fibronectin type III repeat homology 

F;2167-2246/Domain: fibronectin type III repeat homology 

F;2274-2354/Domain: fibronectin type III repeat homology 

F;2382-2462/Domain: fibronectin type III repeat homology 

F;2488-2568/Domain: fibronectin type III repeat homology 

F;2584-2664/Domain: fibronectin type III repeat homology 

F;2677-2757/Domain: fibronectin type III repeat homology 

F;2771-2851/Domain: fibronectin type III repeat homology 

F; 2878-2958 /Domain : fibronectin type III repeat homology 

F;2977-3067/Domain: fibronectin type III repeat homology 

F;3078-3159/Domain: fibronectin type III repeat homology 

F; 3 167-32 4 7 /Domain : fibronectin type III repeat homology 

F; 32 55-3334 /Domain : fibronectin type III repeat homology 

F;3349-3557 /Domain : fibrinogen beta/gamma homology <FBG> 



tatus atypical <3F2> 

F3> 

3F4> 

<3F5> 

<3F6> 

#status atypical <3F7> 

<3F8> 

<3F9> 

<3F10> 

#status atypical <3Fll> 

ttstatus atypical <3F12> 

<3F13> 

<3F14> 

<3F15> 

<3F16> 

<3F17> 

<3F18> 

<3F19> 

<3F20> 

<3F21> 

<3F22> 

<3F23> 

<3F24> 

((status atypical <3F25> 

<3F26> 

<3F27> 

<3F28> 



Query Match 18.9%; Score 682; DB 1; Length 3566; 

Best Local Similarity 28.4%; Pred. No. 2.3e-30; 

Matches 191; Conservative 41; Mismatches 178; Indels 262; 



Gaps 38; 



Qy 


94 


Db 


125 


Qy 


142 


Db 


174 


Qy 


166 


Db 


232 


Qy 


224 


Db 


279 


Qy 


283 


Db 


321 


Qy 


342 


Db 


360 



1 1 1 



: I I 



I I I I I I I I I II 



-HWGPHC TSRCQCKNGALCNPITG 165 

: I i I : I : I : I 



I I I I I I I I I = I 



I I I I I I I I I 
-GRCVCDPGYTGDDCGMR 27 f 



I I 



III I : I 

-CSQRGRCEN- 



I II = I : 

-CSQRGRCEG- 



III 1:1 I I : I I I IN : 

-GRCVCNPGYTGEDCGVRSCPRG CSQRGR 32 0 



1:1 I I I I I I I I 



I I 



I 11:1 I I I : I 



I I 



I I 



-GLYGIKCDKR-CPCHLENTHSCHPMSG 383 
I I I I I I I I 



Qy 384 ECACKPGWSGLYCNE TCSPGFYGEAC-QQIC — SCQNGADCD 422 

| | | I :: I I I : I : I I I : I I : I = 

Db 418 RCVCWPGYTGTDCGSRACPRDCRGRGRCENGVCVCNAGYSGEDCGVRSCPGDCRGRGRCE 477 

Qy 423 SVTGKCTCAPGFKGIDCST PCPLGTYGINCSS-RC— GCKND 461 

11:1111:1111 ill: : 

Db 47 8 S--GRCMCWPGYTGRDCGTRACPGDCRGRGRCVDGRCVCNPGFTGEDCGSRRCPGDCRGH 535 

Qy 462 AVCS PVDGS CTCKAGWHGVDCS I R- CP SGTWGFGCNLTCQCLNG 504 

: I I I I I I I : I I I I I I I I II I I I : I 

Db 536 GLCE--DGVCVCDAGYSGEDCSTRSCPGGCRGRG QCLDGRCVCEDGYSGEDCGVR 588 

Qy 505 GACNTLDGTCTCAPGWRGEKCELP CQDGTYGLN 537 

I I II I I I: I I : III 

Db 589 QCPNDCSQHGVCQ--DGVCICWEGYVSEDCSIRTCPSNCHGRGRCEEGRCLCDPGYTGPT 646 

Qy 538 CAER CDCSHADGCHPTTGHCRCLPGWSGVHC DSV 571 

I I I I I I I I I i : I I I 

Db 64 7 CATRMCPADCRGRGRC--VQGVCLCHVGYGGEDCGQEEPPASACPGGCGPRELCRAGQCV 7 04 



Qy 572 CAEGRWGPNCSL 583 

III I I : I : : 
Db 7 05 CVEGFRGPDCAI 716 



RESULT 12 

T31070 

notch homolog - sea urchin (Lytechinus variegatus) 
C; Species: Lytechinus variegatus (variegated urchin) 

C;Date: 22-Oct-1999 #sequence_revision 22-Oct-1999 #text__change 31-Jan-2000 

C; Access ion: T3107 0 

R; Sherwood, D.R.; McClay, D.R. 

Development 124, 3363-3374, 1997 

A;Title: Identification and localization of a sea urchin Notch homologue: 
insights into vegetal plate regionalization and Notch receptor regulation. 
A/Reference number: Z20966; MUID : 974 54256 ; PMID: 9310331 
A; Access ion: T3107 0 

A; Status: preliminary; translated from GB/EMBL/DDBJ 
A;Molecule type: mRNA 
A; Residues: 1-2531 <SHE> 

A;Cross-references: EMBL : AF000634 ; NID : g257 0350; PID : g2 57 0351 ; PIDN : AAB82 088 . 1 
C; Superf amily : notch protein; ankyrin repeat homology; EGF homology 

Query Match 18.9%; Score 681.5; DB 2; Length 2531; 

Best Local Similarity 24.2%; Pred. No. 2e-30; 

Matches 216; Conservative 68; Mismatches 215; Indels 393; Gaps 53; 

Qy 6 NSCLSFICLLLCHWIGTASPLNLED--PNVCSHWESYSVTVQESYPHPFDQIYYTSCTDI 63 

I : : I I : : I I : I : I I I I : I I 

Db 336 NTYGNFSCICVRGWEGQTCEINKDDCTPNPCQ FEGECEDR 375 

Qy 64 LNWFKCTRH RVSYRTAYRHGEKTMYRRKSQCCPGFYESGEMCVPHCADKCVHGRC 118 

: I I I I III : | : | | I I : I 

Db 376 VASFKCT CPPG--RTGLLC--HLEDACMSNPCHHTAQ 408 

Qy 119 IAPNT — CQCEPGWGGTNCSSACDGDHWGPHCTSRCQ— CKNGALCNPITG— ACH 168 

: : I I I : I I I I I I : I : = I I I : I 



Db 409 CSTSWDGSFICDCATGYQGFNCSEDID ECSLSMDSICQSGGTCQNFDGGWSCL 462 

Qy 169 CAAGFRGWRCE— DRCE QGTYG NDCHQRCQ 196 

I : : I I I I I I I I : : I I I : I 

Db 463 CSSGFTGSRCETDIDECDDDPCYNGGTCLNKRGGYACICLTGFTGTLCETDINECSSN-P 521 

Qy 197 CQNGATCDHVTG — ECRCPPGYTGAFCE DLCPPGK 229 

I I I I : I : I I I I I I I I I I = ill 
Db 522 CLNGASCFDITGRFECACLAGYTGTTCQVNIDDCQSSPCENGGTCIDGVNQFTCLCETGY 581 

Qy 230 HGPQCE QRC PCQNGGVCHHVTG— ECSCPSGWMGTVC GQP 267 

I : I I I I I I i I II : I : I : M I I I I 

Db 582 EGHRCEMDSDECASRPCMNGGVCEDLIGFYQCNCPVGTSGDNCEYNHYDCSSNPCVNDGT 641 

Qy 268 CPEGRFGKNCSQ ECQ CHNGGTC-DAATG 294 

I I I I I I : : I : I I I I I I I I I 
Db 64 2 CVDGINEYTCMCHEGYRGLNCEEDIDDCESRPCHNGGTCVDEVNGYHCLCPIGYHDPFCM 7 01 



QY 



295 QCHCSPGYTGERCQ DECPVGTYGVLCAETCQCV 327 

I I I I I I I I I I I : i 

Db 702 SNINECSSNPCVNGGSCHDGVNEYSCECMAGYTGTRCTDDFDEC SSNPCQ 751 



Qy 32 8 NGGKC — YHVSGACLCEAGFAGERCEARL CPEGLYG 3 61 

: I I I I I I : I I : I I I : I I : I 

Db 752 HGGTCDNRHAFYNCTCQAGYTGLNCEVNIDDCVDEPCLNGGICIDEVNSFQCVCPQTFVG 811 

Qy 3 62 IKCD-KRCPCHLENTHSCHPMSGECACKPGWSGLYCNETCSPGFYGEACQQ ICS 414 

: I : : I I I : I I : : I I I : I I I I 

Db 812 LLCETERSPC EDNQCQ-NGATCVYSEDYAGYSCR--CTSGFQGNFCDDDRNECLFS P 865 

Qy 415 CQNGADCDSVTG--KCTCAPGFKGIDC STP CPLGT 447 

I : i | I : : I : I : I I I : I I I I II 

Db 866 CRNGGSCTNLEGSFECSCLPGYDGPICEINIDECASGPCTNGGICTDLIDDYFCSCQRGF 925 

Qy 448 YGINC SSRC GCKNDAVCSP-VDG-SCTCKAGWHGVDCSIRCPSGTWGFGCNLTC 499 

Ml : I I : I I I I I : I : I I : I : I I I I 

Db 926 TGKNCQNDTDECLSSPCRNGATCHEYVDS YTCSCLVGFSGMHCEINDQDCT TS 978 

Qy 500 QCLNGGACNTLDG TCTCAPGWRGEKCEL PCQDGTYGLNCAER 541 

I I I I I : I I III 1:1 I : : I i : : I I • I 

Db 979 SCLYGGTC— IDGVNSYTCECVTGYTGSNCQIEINECDSDPCENGA TCQDRFGSYSC 1033 

Q y 542 -CD CSH-ADGCHP TTGH CRCLPGWSGVHCD 569 

II I I II II I I I I II 

Db 1034 HCDVGFTGLNCEHWQWCSPQNNPCYNGATCVAMGHLYECHCASNWIGKLCDVPKVSCDI 1093 

Q y 570 SVCAEGRWGPNC SLPCY 586 

: I : I I I III: 
Db 10 94 AASDKNVTRSELCLNGGTCIDATSSHSCLCQDGYTGSYCEVNIDECASAPCH 114 5 



RESULT 13 
A24420 

notch protein - fruit fly (Drosophila melanogas ter ) 
N;Alternate names: neurogenic repetitive locus protein 
C; vSpecies : Drosophila melanogaster 

C;Date: 10-Sep-1999 #sequence_revision 10-Sep-1999 #text__change 10-Sep-1999 



C;Accession: A24420; A24768; S09358; A05267 
R;Kidd, S.; Kelley, M.R.; Young, M.W. 
Mol. Cell. Biol. 6, 3094-3108, 1986 

A;Reference number: A24420; MUID : 87 064 624 ; PMID:3097517 

A;Accession: A24420 

A;Molecule type: DNA 

A; Residues: 1-2703 <KID> 

A;Cross-references: GB:K03508; NID:gl57991; PIDN : AAA2 8 7 25 . 1 ; PID:gl57993 
R;Wharton, K.A.; Johansen, K.M. ; Xu, T . ; Artavanis-Tsakonas, S. 
Cell 43, 567-581, 1985 

A; Reference number: A24768; MUID : 86079539 ; PMID: 3935325 
A; Accession: A24 7 68 
A; Molecule type: rnRNA 

A; Residues: 1-48, 'I', 50-118, 'R*, 120-230, 'I', 232-2 56, ' N 1 , 258-2 66, 'A',2 68- 
872, 'R', 874-958, ' R ', 960-197 0 , ' FH ' , 1 97 3-22 5 6 , 1 G' ,2258 — 2264, 'V',22 66- 
2406, f R' ,2408-2444, ' L 1 , 2446-2703 <WHAl> 

A;Note: the authors translated the codon ATC for residue 49 as Thr, ATT for 
residue 2044 as Arg, GTA for residue 2265 as Ala, CGC for residue 2407 as His, 
and CTT for residue 2445 as Arg 
R;Tautz, D. 

Nucleic Acids Res. 17, 6463-6471, 1989 

A;Title: Hypervariabili ty of simple sequences as a general source for 
polymorphic DNA markers. 

A;Reference number: S09358; MUID : 89385974 ; PMID:2780284 
A; Accession: S 0935 8 
A;Molecule type: DNA 

A;Residues: 2505-2551, ' QQQQ ',25 52 -2576, 'E', 2578-2604 <TAU> 

R;Wharton, K.A.; Yedvobnick, B.; Finnerty, V.G.; Artavanis-Tsakonas , S. 

Cell 40, 55-62, 1985 

A; Title: opa: a novel family of transcribed repeats shared by the Notch locus 
and other developmentally regulated loci in D. melanogaster . 
A;Reference number: A05267; MUID : 8509 932 9 ; PMID:2981631 
A;Accession: A05267 
A;Molecule type: DNA 

A/Residues: 2504-2576, ' E » , 257 8-2611 <WHA2> 

C; Genetics : 

A; Gene: notch; opa 

A; Cross-references : FlyBase : FBgn0 0 04 64 7 
A; Map position: 8.96-9.36 

A;Introns: 53/3; 84/3; 171/3; 240/3; 283/3; 2333/3; 2436/3; 2588/3 
C; Super family : notch protein; ankyrin repeat homology; EGF homology 
C; Keywords: differentiation; tandem repeat; transmembrane protein 
F;27-43/Domain: transmembrane #status predicted <TMM1> 
F;297-328/Domain: EGF homology <EGX1> 
F;530-561/Domain: EGF homology <EGF1> 
F; 568-599/Domain: EGF homology <EGF> 
F; 9 8 8- 10 19 /Domain : EGF homology <EGX2> 
F; 10 64- 10 95 /Domain : EGF homology <EGF3> 
F; 11 87-12 18 /Domain : EGF homology <EGX3> 

F ; 17 4 6- 17 62 /Domain : transmembrane #status predicted <TMM2> 
F; 19 50- 19 82 /Domain : ankyrin repeat homology <AN1> 
F; 1983-2 01 5 /Domain : ankyrin repeat homology <AN2> 
F; 19 8 8-2 004 /Domain : transmembrane #status predicted <TMM3> 
F;2017-2049/Domain: ankyrin repeat homology <AN3> 
F;2050-2082/Domain: ankyrin repeat homology <AN4> 
F;2083-2115/Domain: ankyrin repeat homology <AN5> 
F; 2 53 8-2 56 8/ Region: glut amine- rich 



F;2538-2568/Domain: neurogenic repetitive element #status predicted <OPA> 



Query Match 18.8%; Score 677; DB 1; Length 2703; 

Best Local Similarity 25.4%; Pred. No. 3.7e-30; 

Matches 208; Conservative 78; Mismatches 203; Indels 330; Gaps 51; 

Qy 7 SCL SFICLLLCHWIGTASPLNLED--PNVCSHWESYSVTVQESYPHPFDQIYYTSC 60 

IN :| |: : : II ::::: I I : =1 

Db 502 SCLDDPGTFRCVCMPGFTGTQCEIDIDECQSNPC LNDGTC 541 

Qy 61 TDILNWFKCTRHRVSYRTAYRHGEKTMYRRKSQCCPGFYESGEMC VPHCADKCVHGR 117 

|:||||: | | | : | | : I : I 

D b 542 HDKINGFKCS CALGF--TGARCQINIDDCQSQPCRNR 57 6 

Qy H8 CIAPNTCQCEPGWGGTNCS SACDGDHWGPHCTSRCQCKNGALCNPITG-ACH 168 

I I : I : i I I : I I : I : I I : II : : I 

Db 577 GICHDSIAGYSCECPPGYTGTSCEININDCDSN PCHRGKCIDDVNSFKCL 626 



Qy 



169 CAAGFRGW RCEDR CEQGTYG NDCHQRCQ 19 6 

| | : | : I : II I : M I I : I I 

Db 627 CDPGYTGYICQKQINECESNPCQFDGHCQDRVGSYYCQCQAGTSGKNCEVNVNECHSN-P 685 



Qy 197 CQNGATC-DHVTG-ECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPCQNGGVC-HHVTG-E 252 

| I I I M I : : I : I I I : I I i I I : = I MINI I I : 

Db 686 CNNGATCIDGINSYKCQCVPGFTGQHCE KNVDECIS-SPCANNGVCIDQVNGYK 738 

Qy 253 CSCPSGWMGTVC GQP CPEGRFGKNCS QECQ-- 2 82 

I III: I I Ml III M 

Db 739 CECPRGFYDAHCLSDVDECASNPCVNEGRCEDGINEFICHCPPGYTGKRCELDIDECSSN 798 

Qy 283 -CHNGGTC-DAATG-QCHCSPGYTGERCQ— -DECPVGTYGVLCAETCQCVNGGKCY-HV 335 

I : I I I I I I I I I I I I : : I : I : I I I I I I I I 

Db 799 PCQHGGTCYDKLNAFSCQCMPGYTGQKCETMIDDC VTNPCGNGGTCIDKV 848 

Qy 336 SG-ACLCEAGFAGERCEARLCPEGLYGIKC-DKRCPCHLENTHSCHPMSG ECACKP 389 

: | | : I : I I I I : :: I I I I : I I I I I I I 

D b 849 NGYKCVCKVPFTGRDCESKMDP CASNRC KNEAKCTPS SNFLDFSCTCKL 897 



Qy 



390 GWSGLYCNE TCSPGFYGEAC QQICS CQN 417 

|::| ||:| I: M I I 1= Ml 

Db 8 98 GYTGRYCDEDI DECS LSSPCRNGASCLNVPGSYRCLCTKGYEGRDCAINTDDCASFP CQN 957 



Qy 



418 GADCDSVTG--KCTCAPGFKGIDCST PCPLGTYGI 450 

II I I I I I I I I MM M 

Db 958 GGTCLDGIGDYSCLCVDGFDGKHCETDINECLSQPCQNGATCSQYVNSYTCTCPLGFSGI 1017 



Qy 



451 NCS SRCGCKNDAVCSPVDG SCTCKAGWHGVDCSI R 485 

|| : || |:|| : I : I I I : I : I : 

Db 1018 NCQTNDEDCTESSCLNGGSC--IDGINGYNCSCLAGYSGANCQYKLNKCDSNPCLNGATC 1075 



Qy 486 CPSGTWGFGCNL TCQCLNGGACNTL--DGTCTCAPGWRGEKCE- 52 6 

I M | I I : III I : : : I I : I I I : I : 

Db 1076 HEQNNEYTCHCPSGFTGKQCSEYVDWCGQSPCENGATCSQMKHQFSCKCSAGWTGKLCDV 1135 

Qy 527 — LPCQDGT— YGLNCAERCD CSHADGCHPTTGHCRCLPGWSGVHC 568 

: I I I I I : : I : I I | | I : : I : I 

Db 1136 QTISCQDAADRKGLSLRQLCNNGTCKDYGNSHV CYCSQGYAGSYCQKEIDECQSQP 1191 



Qy 569 DSVCAEGRWGPNCSL --PC 585 

: I : I I I I I II 
Db 1192 CQNGGTCRDLIGAYECQCRQGFQGQNCELNIDDCAPNPC 1230 



RESULT 14 
T09070 

probable tenascin X - mouse 

C; Species: Mus musculus (house mouse) 

C;Date: ll-Jun-1999 #sequence_revision ll-Jun-1999 #text_change 21-Jan-2000 
C; Access ion: TO 9 07 0 

R ; Rowen , L . ; Mahairas, G . ; Qin, S.; Ahearn, M.E.; Dankers, C . ; Lasky, S.; 
Loretz, C; Schmidt, S.; Tipton, S.; Traicoff, R. ; Zackrone, K. ; Hood, L . 
submitted to the EMBL Data Library, October 1997 

A; Description : Sequence of the mouse major histocompatibility locus class III 
region . 

A/Reference number: Z16543 
A/Accession: T09070 

A; Status: preliminary; translated from GB/EMBL/DDBJ 

A; Molecule type: DNA 

A; Residues: 1-4006 <ROW> 

A;Cross-references: EMBL : AF0300 01 ; NID : g25 64 9 4 5 ; PID:g2564958 

C; Genetics : 

A; Gene: TNX 

A; Map position: 17 

A;Introns: 124/1; 735/1; 773/3; 826/1; 914/1; 1037/1; 1135/1; 1232/1; 1329/1; 
1440/1; 1548/1; 1645/1; 1745/1; 1841/1; 1938/1; 2045/1; 2154/1; 2253/1; 2367/1 
2476/1; 2589/1; 2696/1; 2804/1; 2911/1; 3019/1; 3115/1; 3208/1; 3302/1; 3405/1 
3517/1; 3558/1; 3606/1; 3646/1; 3694/1; 3737/3; 3782/1; 3832/3; 3865/1; 3919/1 
3973/3 

C; Superf amily : tenascin-X; EGF homology; fibrinogen beta/gamma homology; 

fibronectin type III repeat homology 

C; Keywords: extracellular matrix 

F; 422-44 8 /Domain : EGF homology <EGF> 

F; 8 2 6- 9 0 6/ Domain : fibronectin type III repeat homology <3FR> 
F; 3789-3997/ Domain : fibrinogen beta/gamma homology <FBG> 

Query Match 18.8%; Score 676.5; DB 2; Length 4006; 

Best Local Similarity 29.1%; Pred. No. 5e-30; 

Matches 190; Conservative 50; Mismatches 178; Indels 235; Gaps 38; 

HGEKTMYRRKSQCCPGF YESGEMCVPHCAD--KCVHGRCIAPNT 12 3 

I I : I ! I I : : I I I =11 III- 



I I I : I : I I : I I I I I : I MINIM I 
-CFPGYSGPSCSWPSCPGD CQGRGRC VKGVCVCRAGFSG PDC 231 



| : M : | : Mill: I III I I I : 

fCNQRGRCEE GRCVCDPGYSGEDCGVRSCPRG CSQRGRCE 27 8 



Ml I : | I I M I I ill : M hill 
-GLCVCNPGYSGEDCGVRNCPRG CSQRGRCED GRCVCDP 317 



Qy 


82 


Db 


135 


Qy 


124 


Db 


191 


Qy 


183 


Db 


232 


Qy 


242 


Db 


279 



Qy 301 GYTGERCQDECPVGTYGVLCAETC--QCVNGGKCYHVSGACLCEAGFAGERCEARLCPEG 358 

||:||| II I : I I : I I I hi I : : I I I I I I 

Db 318 GYSGEDCS MRTCPWDCGDGGRC--VDGRCVCWPGYSGEDCSTRTCPRD 363 

Qy 359 LYGI-KC-DKRCPCHLE NTHSC HPMSGECACKPGWSGLYCNE 398 

| : I I I I II | | | | | | : : | | : 

Db 3 64 CRGRGRCEDGECICDAGYSGDDCGVRSCPGDCNQRGHCEDGRCVCWPGYTGADCSTRACP 423 

Qy 399 TCSPGFYGEAC-QQIC — SCQNGADCDSVTGKCTCAPGFKGIDCST 441 

I I : I I I : I I : : I : I I : I I I I : I I I I 
Db 424 RDCRGRGRCEDGVCVCHAGYSGEDCGVRSCPGDCRGRGNCES— GRCVCWPGYTGRDCGT 481 

Qy 442 PCPLGTYGINCS S-RC--GCKNDAVCSPVDGSCTCKAGWHGV 480 

I I I : I I I 1 I : I : I I I I : I 

Db 482 RACPGDCRGRGRCVDGRCVCNPGFTGEDCGSRRCPGDCRGHGHCE— NGVCVCAVGYSGD 539 

Qy 481 DCSIR-CPSGTWGFGCNLTCQCLNG GACNTLDGTC 514 

I I I I I I I II I I I I I I I I I I 

Db 540 DCSTRSCPSDCRGRG QCLNGLCECDEGYSGEDCGIRRCPRDCSQHGVCQ--DGLC 592 

Qy 515 TCAPGWRGEKCEL PCQDGTYGLN CAER CDCSHADGCHPT 553 

I I : I I I : I : I I I III II I 

Db 593 MCHAGYAGEDCS IRTCPADCRRRGRCEDGRCVCNPGYTGPACATRTCPADCRGRGRC V 650 

Qy 554 TGHCRCLPGWSGVHC DSVCAEGRWGPNCSL 583 

1111:111 | | | | I I : ! : : 

Db 651 QGVCMCYVGYSGEDCGQEEPPASACPGGCGPRELCRAGQCVCVEGFRGPDCAI 7 03 



RESULT 15 
S18188 

notch protein homolog - rat 

C; Species: Rattus norvegicus (Norway rat) 

C;Date: 19-Feb-1994 #sequence_revision 10-Nov-1995 #text_change 02-Aug-2002 
C;Accession: S18188 

R;Weinmaster, G. ; Roberts, V.J.; Lemke, G . 
Development 113, 199-205, 1991 

A; Title: A homolog of Drosophila Notch expressed during mammalian development. 
A;Reference number: S18188; MUID : 92111383 ; PMID:1764995 
A; Accession: S1818 8 
A;Molecule type: mRNA 
A;Residues: 1-2531 <WEI> 

A;Cross-references : EMBL:X57405; NID:g57634; PID:g57635 

C; Superf amily : notch protein; ankyrin repeat homology; EGF homology 

F ; 987-1018 /Domain : EGF homology <EGF1> 

F; 1025-1056/ Doma i n : EGF homology <EGF> 

F ; 12 33-12 64 /Domain : EGF homology <EGF2> 

F; 1917-1949/Domain: ankyrin repeat homology <AN1> 

F; 1950-1 9 82 /Domain : ankyrin repeat homology <AN2> 

F; 1984-2016/ Doma i n : ankyrin repeat homology <AN3> 

F; 2 017-2 04 9/ Domain : ankyrin repeat homology <AN4> 

F;2050-2082/Domain: ankyrin repeat homology <7\N5> 

Query Match 18.7%; Score 675; DB 2; Length 2531; 

Best Local Similarity 25.7%; Pred. No. 4.6e-30; 

Matches 208; Conservative 70; Mismatches 206; Indels 324; Gaps 48; 



Qy 

Db 



Db 

Qy 



QY 



Qy 



Qy 



Qy 



4 SLNSCLS FICLLLCHWIGTASPLNLEDPNVCSHWESYSVTVQESY 4 8 

: : | | | : : I I I I : I I : I M 

603 NINECHSQPCRHGGTCQDRDNYYLCLCLKGTTGPNCEINLDD CA 64 6 



Qy 49 PHPFDQIYYTSCTDILNWFKCTRHRVSYRTAYRHGEKTMYRRKSQCCPGFYESGEMC 105 
: I I :| | :: ::| I ||: :| II 

647 SNPCDS GTCLDKIDGYECA CEPGY— TGSMCNVN 678 



106 VPHCA DKCVHGRC IAPNT 123 

: || : hi I I : 

Db 679 IDECAGSPCHNGGTCEDGIAGFTCRCPEGYHDPTCLSEVNECNSNPCIHGACRDGLNGYK 738 



Qy 124 CQCEPGWGGTNC — SSACDGDHWGPHCTSRCQCKNGALCNPITG— ACHCAAGFRGWRC 17 8 

I I Ml II I I : : I : : INI: I I 

Db 739 CDCAPGWSGTNCDINNNECESN PCVNGGTCKDMTSGYVCTCREGFSGPNC 788 

Qy 179 EDRCEQGTYGNDCHQRCQCQNGATC-DHVTG-ECRCPPGYTGAFCEDLCPPGKHGPQCEQ 236 

: | | : | II I II M : M I III) II : I I 
Db 789 Q TNINECASN-PCLNQGTCIDDVAGYKCNCPLPYTGATCEWLAP C-A 8 34 

Qy 237 RCPCQNGGVC HHVTGECSCPSGWMGTVC GQPCPEGR 272 

I | : | | | I : : M I : I M I II I 

Db 835 TSPCKNSGVCKESEDYESFSCVCPTGWQGQTCEIDINECVKSPCRHGASCQNTNGSYRCL 8 94 



273 FGKNCS QECQ CHNGGTCDAATGQ — CHCSPGYTGERCQDE 310 

I : I I : I : INI: ! I I M M : : 

Db 895 CQAGYTGRNCESDIDDCRPNPCHNGGSCTDGVNAAFCDCLPGFQGAFCEEDINECATNPC 954 



Qy 3ii C PVGT YGVLCAET CQCVNGGKCYHVSG ACLCEAG 34 4 

|| | I : I I I II I I I Mil 

Db 955 QNGANCTDCVDSYTCTCPTGFNGIHCENNTPDCTESSCFNGGTC— VDGINSFTCLCPPG 1012 

Qy 345 FAGERCEARLCPEGLYGI-KCDKRCPCHLENTHSCHPMSG— ECACKPGWSGLYCNE 398 

Ml: I : : I I I I I II I : I I I : : M I 

Db 1013 FTGSYCQ YDWECDSR-PCLHGGT--CQDSYGTYKCTCPQGYTGLNCQNLVR 1061 



399 --TCSPGFYGEACQQI CSCQN GADCDSVTGKCTCAPGFKGIDCSTPCPLGTY 448 

: | III II:: Mil — I I MM : I I 

Db 1062 WCDSAPCKNGGKCWQTNTQYHCECRSGWTGFNCDVLSVSCEVAAQKRGIDVTLLCQHGGL 1121 



449 GIN CSSRCG CKNDAVCSPVDG— SCTCKAGWHGVDCS 483 

| : | MIIM I II I I MM M II 

Db 1122 CVDEEDKHYCHCQAGYTGSYCEDEVDECSPNPCQNGATCTDYLGGFSCKCVAGYHGSNCS 1181 



Qy 484 IRCPSGTWGFGCNLT C QCLNG 504 

II I II M I : I I 

Db 1182 EEINECLSQPCQNGGTCIDLTNTYKCSCPRGTQGVHCEINVDDCHPPLDPASRSPKCFNN 1241 



5 05 GACNTLDG— TCTCAPGWRGEKCE LPCQD-GTYGLNCAERCDCSHADGCHPT 553 

|| I I I I M M I M M M M IMM 

Db 1242 GTCVDQVGGYTCTCPPGFVGERCEGDVNECLSNPCDPRGTQ — NCVQRVN 1289 

Qy 554 TGHCRCLPGWSGVHCDSVCAEGRWGPNC 581 

I I I Ml MM I I I 

Db 1290 DFHCECRAGHTGRRCESV-INGCRGKPC 1316 

Search completed: March 26, 2004, 16:12:14 
Job time : 19 . 883 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM protein - protein search, using sw model 
Run on: March 26, 2004, 16:11:16 



Search time 27.1611 Seconds 

(without alignments) 

5645.353 Million cell updates/s 



Title: 

Perfect score: 
Sequence : 

Scoring table: 



US-10-092-390-4 
3601 

1 MVISLNSCLSFICLLLCHWI . 



.HCDSVCAEGRWGPNCSLPCY 58 6 



BLOSUM62 

Gapop 10.0 , Gapext 0.5 



Searched: 1065169 seqs, 261661801 residues 

Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 



1065169 



Database 



Published__Applications__AA: * 

/cgn2_6/ptodata/2/pubpaa/US07_PUBCOMB.pep: * 
/cgn2_6/ptodata/2/pubpaa/PCT_NEW_PUB.pep: * 
/cgn2_6/ptodata/2/pubpaa/US06_NEW_PUB . pep : * 
/cgn2_6/ptodata/2/pubpaa/US0 6_PUBCOMB . pep : * 
/cgn2_6/ptodata/2/pubpaa/US07__NEW_PUB.pep: * 
/cgn2_6/ptodata/2/pubpaa/PCTUS_PUBCOMB.pep:^ 
/cgn2_6/ptodata/2/pubpaa/US08_NEW_PUB.pep: * 
/cgn2_6/ptodata/2/pubpaa/US08_PUBCOMB.pep: * 
/cgn2_6/ptodata/2/pubpaa/US0 9A_PUBCOMB . pep : * 
/cgn2_6/ptodata/2/pubpaa/US09B_PUBCOMB.pep: 
/cgn2_6/ptodata/2/pubpaa/US0 9C_PUBCOMB.pep: 
/cgn2_6/ptodata/2/pubpaa/US09_NEW__PUB.pep: * 
/cgn2_6/ptodata/2/pubpaa/US10A_PUBCOMB.pep : 
/cgn2_6/ptodata/2/pubpaa/USlOB__PUBCOMB.pep : 
/cgn2_6/ptodata/2/pubpaa/US10C_PUBCOMB.pep: 
/cgn2_6/ptodata/2/pubpaa/US10_NEW_PUB.pep: * 
/cgn2__6/ptodata/2/pubpaa/US60 _NEW_PUB . pep : J 
/cgn2_6/ptodata/2/pubpaa/US60~PUBCOMB.pep: * 
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Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed 
and is derived by analysis of the total score distribution. 
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10 


US-09-842-758-57 


Sequence 


57, Appl 


27 


843.5 


23 


4 


830 


10 


US-09-751-7O8A-134 


Sequence 


134, App 




843,5 


23 


4 


830 


10 


US-09-751-7O8A-140 


Sequence 


140, App 


2 9 


843.5 


23 


4 


830 


12 


US-10-174-333-57 


Sequence 


57, Appl 




8 08 


22 


4 


865 


10 


US-09-842-758-2Q 


Sequence 


20, Appl 


31 


808 


22 


4 


865 


12 


US-10-174-333-20 


Sequence 


20, Appl 


32 


808 


22 


. 4 


866 


12 


US-10-433-579-4 


Sequence 


4, Appli 


33 


783.5 


21 


. 8 


934 


10 


US-09-842-758-18 


Sequence 


18, Appl 


34 


783.5 


21 


. 8 


934 


12 


US-10-174-333-18 


Sequence 


18, Appl 


3 5 


779 


21 


. 6 


296 


10 


US-09-866-050A-458 


Sequence 


458, App 


36 


779 


21 


. 6 


299 


10 


US-O9-866-O50A-192 


Sequence 


192, App 


37 


779 


21 


. 6 


299 


10 


US-09-866-050A-332 


Sequence 


332, App 


38 


721.5 


20 


. 0 


310 


14 


US-10-084-994-12 


Sequence 


12, Appl 


39 


721.5 


20 


. 0 


310 


14 


US-10-193-109-12 


Sequence 


12, Appl 


40 


721 . 5 


20 


. 0 


310 


15 


US-10-193-409-12 


Sequence 


12, Appl 


41 


719 


20 


. 0 


2524 


15 


US-10-190-115-25 


Sequence 


2 5, Appl 


42 


719 


20 


. 0 


2524 


15 


US-10-369-072-25 


Sequence 


25, Appl 


43 


708.5 


19 


. 7 


2447 


15 


US-10-190-115-28 


Sequence 


28, Appl 


44 


708.5 


19 


.7 


2447 


15 


US-10-369-072-28 


Sequence 


28, Appl 


45 


697 


19 


.4 


2321 


14 


US-10-356-625-2 


Sequence 


2, Appli 














ALIGNMENTS 







RESULT 1 
US-10-092-390-4 

; Sequence 4, Application US/10092390 
; Publication No. US 2 0030013 8 65A1 
; GENERAL INFORMATION: 
; APPLICANT : Yu, Xuanchuan 



APPLICANT: Miranda, Maricar 

TITLE OF INVENTION: No. US2 0030013865Alel Human EGF-Family Proteins and 
Polynucleotides Encoding the Same 
FILE REFERENCE: LEX-0317-USA 
CURRENT APPLICATION NUMBER: US/ 10/ 092 , 390 
CURRENT FILING DATE: 2002-03-06 
PRIOR APPLICATION NUMBER: US 60/275,013 
PRIOR FILING DATE: 2001-03-12 
NUMBER OF SEQ ID NOS : 4 

SOFTWARE: FastSEQ for Windows Version 4.0 
SEQ ID NO 4 
LENGTH: 58 6 
TYPE: PRT 

ORGANISM: homo sapiens 
US-10-092-390-4 

Query Match 100.0%; Score 3601; DB 14; Length 586; 

Best Local Similarity 100.0%; Pred. No. 2.4e-232; 

Matches 586; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 
Qy 1 MVISLNSCLSFICLLLCHWIGTASPLNLEDPNVCSHWESYSVTVQESYPHPFDQIYYTSC 60 

I I I I I I M I I I I I I I II M I I I I I I I I I I I I II M I I I I I I I I I I I Ml 

Db 1 mvisLNSCLSFICLLLCHWIGTASPLNLEDPNVCSHWESYSVTVQESYPHPFDQIYYTSC 60 

Qy 61 TDILNWFKCTRHRVSYRTAYRHGEKTMYRRKSQCCPGFYESGEMCVPHCADKCVHGRCIA 12 0 

MM MM M I II I I M I II II M I II II M I Ml 

Db 61 TDILNWFKCTRHRVSYRTAYRHGEKTMYRRKSQCCPGFYESGEMCVPHCADKCVHGRCIA 12 0 

Qy 121 PNTCQCEPGWGGTNCSSACDGDHWGPHCTSRCQCKNGALCNPITGACHCAAGFRGWRCED 180 

I | | I I M I I I M M II I II I II I I M I I II I I M II I II I I I I I I II II I II I I I M II I 

Db 121 PNTCQCEPGWGGTNCSSACDGDHWGPHCTSRCQCKNGALCNPITGACHCAAGFRGWRCED 180 

Qy 181 RCEQGTYGNDCHQRCQCQNGATCDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPC 24 0 

MM II II M II M II I II II M M I I I I M I I M II II II II M 

Db 181 RCEQGTYGNDCHQRCQCQNGATCDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPC 24 0 

Q y 241 QNGGVCHHVTGECSCPSGWMGTVCGQPCPEGRFGKNCSQECQCHNGGTCDAATGQCHCSP 3 00 

M II I I II II I II II M II I I II I I M I I I I II I I II II I M II M II M II I 

Db 241 QNGGVCHHVTGECSCPSGWMGTVCGQPCPEGRFGKNCSQECQCHNGGTCDAATGQCHCSP 300 

Qy 301 GYTGERCQDECPVGTYGVLCAETCQCVNGGKCYHVSGACLCEAGFAGERCEARLCPEGLY 360 

| | | || || | |] II II II I I M I I I I II M M I I I I I I I I M M I M M M I I II I M II M 
Db 3 01 GYTGERCQDECPVGTYGVLCAETCQCVNGGKCYHVSGACLCEAGFAGERCEARLCPEGLY 36 0 

Q y 361 GIKCDKRCPCHLENTHSCHPMSGECACKPGWSGLYCNETCSPGFYGEACQQICSCQNGAD 420 

| | | M I I II II I M M I II II I I M I I i M I I II II II M I I M M I I M II II 

Db 361 GIKCDKRCPCHLENTHSCHPMSGECACKPGWSGLYCNETCSPGFYGEACQQICSCQNGAD 420 

Qy 421 CDSVTGKCTCAPGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVCSPVDGSCTCKAGWHGV 480 

IMM I I I I I M II I I I II II M M II I I II I I M M M I M II II I II M i I M 

Db 421 CDSVTGKCTCAPGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVCSPVDGSCTCKAGWHGV 480 

Qy 481 DCSIRCPSGTWGFGCNLTCQCLNGGACNTLDGTCTCAP GWRGEKCELPCQDGTYGLNCAE 54 0 

M || I I I I II I II I I II II II II I M II I I I I I M II I II I I II I I II M I I I 

Db 4 81 DCS I RCPSGTWGFGCNLTCQCLNGGACNTLDGTCTCAP GWRGEKCELPCQDGTYGLNCAE 54 0 



Qy 



541 RCDCSHADGCHPTTGHCRCLPGWSGVHCDSVCAEGRWGPNCSLPCY 58 6 



541 RCDCSHADGCHPTTGHCRCLPGWSGVHCDSVCAEGRWGPNCSLPCY 586 



RESULT 2 
US-10-092-390-2 

; Sequence 2, Application US/10092390 
; Publication No. US20030013865A1 
; GENERAL INFORMATION: 
; APPLICANT: Yu, Xuanchuan 

APPLICANT: Miranda, Maricar 
; TITLE OF INVENTION: No. US2 00300138 65Alel Human EGF-Family Proteins and 
Polynucleotides Encoding the Same 

FILE REFERENCE: LEX-0317-USA 
; CURRENT APPLICATION NUMBER: US / 1 0 / 092 , 3 9 0 
; CURRENT FILING DATE: 2002-03-06 
; PRIOR APPLICATION NUMBER: US 60/275,013 
; PRIOR FILING DATE: 2001-03-12 
; NUMBER OF SEQ ID NOS : 4 

; SOFTWARE: FastSEQ for Windows Version 4.0 
; SEQ ID NO 2 

LENGTH: 114 0 
; TYPE: PRT 

; ORGANISM: homo sapiens 
US-10-092-390-2 



Query Match 100.0%; Score 3601; DB 14; Length 1140; 

Best Local Similarity 100.0%; Pred. No. 4.4e-232; 

Matches 58 6; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 



Qy 


1 


MVISLNSCLSFICLLLCHWIGTASPLNLEDPNVCSHWESYSVTVQESYPHPFDQI YYTSC 

I | I | | | | I I I 1 I 1 1 1 1 1 1 1 M 1 M 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 M 1 1 1 1 1 1 1 t 1 1 1 

MVISLNSCLSFICLLLCHWIGTASPLNLEDPNVCSHWESYSVTVQESYPHPFDQI YYTSC 


60 


Db 


1 


60 


Qy 


61 


TDILNWFKCTRHRVSYRTAYRHGEKTMYRRKSQCCPGFYESGEMCVPHCADKCVHGRCIA 

I I 1 M I 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 

TDILNWFKCTRHRVSYRTAYRHGEKTMYRRKSQCCPGFYESGEMCVPHCADKCVHGRCIA 


120 


Db 


61 


120 


Qy 


121 


PNTCQCEPGWGGTNCSSACDGDHWGPHCTSRCQCKNGALCNPITGACHCAAGFRGWRCED 

I | | M | | | | | 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 t 1 1 1 1 1 1 1 1 1 1 1 1 1 M i 1 1 I 1 1 1 1 1 1 1 i 1 1 1 1 
PNTCQCEPGWGGTNCSSACDGDHWGPHCTSRCQCKNGALCNPITGACHCAAGFRGWRCED 


180 


Db 


121 


180 


Qy 


181 


RCEQGTYGNDCHQRCQCQNGATCDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPC 

I I I I I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 M 1 1 1 1 M 1 II 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 M 1 1 

RCEQGTYGNDCHQRCQCQNGATCDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPC 


240 


Db 


181 


240 


Qy 


241 


QNGGVCHHVTGECSCPSGWMGTVCGQPCPEGRFGKNCSQECQCHNGGTCDAATGQCHCSP 

I | | | I I I M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 M 1 

QNGGVCHHVTGECSCPSGWMGTVCGQPCPEGRFGKNCSQECQCHNGGTCDAATGQCHCSP 


300 


Db 


241 


300 


Qy 


301 


GYTGERCQDECPVGTYGVLCAETCQCVNGGKCYHVSGACLCEAGFAGERCEARLCPEGLY 

| | | || | | | I I 1 1 1 II 1 II 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 II M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

GYTGERCQDECPVGTYGVLCAETCQCVNGGKCYHVSGACLCEAGFAGERCEARLCPEGLY 


360 


Db 


301 


360 


Qy 


361 


GIKCDKRCPCHLENTHSCHPMSGECACKPGWSGLYCNETCSPGFYGEACQQICSCQNGAD 

| M 1 1 1 1 1 1 1 1 1 1 II 1 1 II 1 1 1 1 1 M II 1 1 1 II 1 1 1 1 1 1 1 1 1 M 1 M 1 1 1 M II II 1 1 1 1 

GIKCDKRCPCHLENTHSCHPMSGECACKPGWSGLYCNETCSPGFYGEACQQICSCQNGAD 


420 


Db 


361 


420 



Qy 


421 


nu 
UD 


A 9 '1 


QY 


481 


Db 


481 


Qy 


541 


Db 


541 



| | M | I I I I I I I M I I I I I I M I I I! I I I I I I I I I I M I M I I I I I I I I I I I 



DCSIRCPSGTWGFGCNLTCQCLNGGACNTLDGTCTCAPGWRGEKCELPCQDGTYGLNCAE 54 0 

| | | | | M | | | | | | I I I I I I II I I I I I I I I II II I I I I I II I I I I 

DCSIRCPSGTWGFGCNLTCQCLNGGACNTLDGTCTCAPGWRGEKCELPCQDGTYGLNCAE 54 0 



RCDCSHADGCHPTTGHCRCLPGWSGVHCDSVCAEGRWGPNCSLPCY 

I | | | | I I I I I I II I I I I I I I I I II I I I II I I I I I M M I I I I I I II 

RCDCSHADGCHPTTGHCRCLPGWSGVHCDSVCAEGRWGPNCSLPCY 



586 



586 



RESULT 3 

US-10-052-648A-33 

Sequence 33, Application US/10052648A 
Publication No. US2004 000555 8A1 
GENERAL INFORMATION: 
APPLICANT: Anderson, David 
APPLICANT: Burgess, Catherine 
APPLICANT: Casman, Stacie 
APPLICANT: Colman, Steven 
APPLICANT: Edinger, Shlomit R. 
APPLICANT: Ellerman, Karen 
APPLICANT: Gerlach, Valerie 
APPLICANT : Gunther, Erik 
APPLICANT: Kekuda, Ramesh 
APPLICANT: MacDougall, John R. 
APPLICANT: Mehraban, Fuad 
APPLICANT: Patturajan, Meera 
APPLICANT: Rothenberg, Mark 
APPLICANT: Shimkets, Richard 
APPLICANT: Smithson, Glennda 
APPLICANT: Spytek, Kimberly A. 
APPLICANT: Stone, David J. 
APPLICANT: Vernet, Corine A.M. 
APPLICANT: Zerhusen, Bryan D. 

TITLE OF INVENTION: PROTEINS, POLYNUCLEOTIDES ENCODING THEM AND METHODS OF 
TITLE OF INVENTION: USING THE SAME 
FILE REFERENCE: 21402-250 (CURA-550) 
CURRENT APPLICATION NUMBER: US/ 10/ 052 , 648A 
CURRENT FILING DATE : 2002-12-09 
PRIOR APPLICATION NUMBER: 60/262,454 
PRIOR FILING DATE: 2001-01-18 
PRIOR APPLICATION NUMBER: 60/272,920 
PRIOR FILING DATE: 2001-03-02 
PRIOR APPLICATION NUMBER: 60/284,549 
PRIOR FILING DATE: 2001-04-18 
PRIOR APPLICATION NUMBER: 60/303,229 
PRIOR FILING DATE: 2001-07-05 
PRIOR APPLICATION NUMBER: 60/262,892 
PRIOR FILING DATE: 2001-01-19 
PRIOR APPLICATION NUMBER: 60/263,605 
PRIOR FILING DATE: 2001-01-23 
PRIOR APPLICATION NUMBER : 60/269,098 
PRIOR FILING DATE: 2001-02-15 



PRIOR APPLICATION NUMBER: 60/2 64,159 
PRIOR FILING DATE: 2001-01-25 
PRIOR APPLICATION NUMBER: 60/265,517 
PRIOR FILING DATE: 2001-01-31 
PRIOR APPLICATION NUMBER: 60/271,855 
PRIOR FILING DATE: 2001-02-27 

Remaining Prior Application data removed - See File Wrapper or PALM. 
NUMBER OF SEQ ID NOS : 97 
SOFTWARE: Patentln Ver. 2.1 
SEQ ID NO 33 
LENGTH: 1140 
TYPE: PRT 

ORGANISM: Homo sapiens 
US-10-052-648A-33 

Query Match 100.0%; Score 3601; DB 15; Length 1140; 

Best Local Similarity 100.0%; Pred. No. 4.4e-232; 

Matches 586; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

MVISLNSCLSFICLLLCHWIGTASPLNLEDPNVCSHWESYSVTVQESYPHPFDQIYYTSC 60 

I I I I I M I I I I I I M I 1 I I M I I I I I I I I I I I M I I I M I I I I M I I I I I II I I I I I I I I 

MVISLNSCLSFICLLLCHWIGTASPLNLEDPNVCSHWESYSVTVQESYPHPFDQIYYTSC 60 
TDILNWFKCTRHRVSYRTAYRHGEKTMYRRKSQCCPGFYESGEMCVPHCADKCVHGRCIA 120 

| M I I M I I 1 I II I I I I I I M I I I 1 I I I I I I M I I I I I I I I I I I I I I I I I M I I II I I I I 

TDILNWFKCTRHRVSYRTAYRHGEKTMYRRKSQCCPGFYESGEMCVPHCADKCVHGRCIA 12 0 



Qy 


1 


Db 


1 


Qy 


61 


Db 


61 


Qy 


121 


Db 


121 


Qy 


181 


Db 


181 


Qy 


241 


Db 


241 


Qy 


301 


Db 


301 


Qy 


361 


Db 


361 


Qy 


421 


Db 


421 


Qy 


481 


Db 


481 



PNTCQCEPGWGGTNCSSACDGDHWGPHCTSRCQCKNGALCNPITGACHCAAGFRGWRCED 18 0 

| I I I M I I II I I I I II I M I I II I I I I I I I I I I I I I I I I M I I II I I I II I M I II I I I I 

PNTCQCEPGWGGTNCSSACDGDHWGPHCTSRCQCKNGALCNPITGACHCAAGFRGWRCED 



180 



RCEQGTYGNDCHQRCQCQNGATCDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPC 24 0 

I M | | M I I I I I I I I I I I I I M I I I I II I I I I I I I I I I M I I M I II I I M I I I 

RCEQGTYGNDCHQRCQCQNGATCDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPC 



240 



QNGGVCHHVTGECSCPSGWMGTVCGQPCPEGRFGKNCSQECQCHNGGTCDAATGQCHCSP 300 

I M I M I I I I I I II I I I I I I I I I I I I I II II I I I I M II I I I I I I I I I I I I I I I I I I I I I 

QNGGVCHHVTGECSCPSGWMGTVCGQPCPEGRFGKNCSQECQCHNGGTCDAATGQCHCSP 300 



GYTGERCQDECPVGTYGVLCAETCQCVNGGKCYHVSGACLCEAGFAGERCEARLCPEGLY 

| I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I M I I I I I I M I M I I I I I I I I I I I I 



360 



420 



GIKCDKRCPCHLENTHSCHPMSGECACKPGWSGLYCNETCSPGFYGEACQQICSCQNGAD 

I I M II I I I I I I I I M I I I I M I M I I I I I I I I I I i II 

GIKCDKRCPCHLENTHSCHPMSGECACKPGWSGLYCNETCSPGFYGEACQQICSCQNGAD 420 

CDSVTGKCTCAPGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVCSPVDGSCTCKAGWHGV 480 

II M I I I I M I I II I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I 

CDSVTGKCTCAPGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVCSPVDGSCTCKAGWHGV 4 80 

DCSIRCPSGTWGFGCNLTCQCLNGGACNTLDGTCTCAPGWRGEKCELPCQDGTYGLNCAE 54 0 

I M I I I I I II I I I I I I I I I I I I I I I I I I M II I I I I I I I II I I I I I M I 

DCSIRCPSGTWGFGCNLTCQCLNGGACNTLDGTCTCAPGWRGEKCELPCQDGTYGLNCAE 54 0 



Qy 



541 RCDCSHADGCHPTTGHCRCLPGWSGVHCDSVCAEGRWGPNCSLPCY 586 

I I I I I I I I II I I I II I I I I I I I I I M I I I I I I M I I I I I I I I I I M 



Db 



541 RCDCSHADGCHPTTGHCRCLPGWSGVHCDSVCAEGRWGPNCSLPCY 58 6 



RESULT 4 

US-10-052-648A-34 

Sequence 34, Application US/10052648A 
Publication No. US2 004 0005558A1 
GENERAL INFORMATION: 
APPLICANT: Anderson, David 
APPLICANT: Burgess, Catherine 
APPLICANT: Casman, Stacie 
APPLICANT: Colman, Steven 
APPLICANT: Edinger, Shlomit R. 
APPLICANT: Ellerrnan, Karen 
APPLICANT: Gerlach, Valerie 
APPLICANT: Gunther, Erik 
APPLICANT: Kekuda, Ramesh 
APPLICANT: MacDougall, John R. 
APPLICANT: Mehraban, Fuad 
APPLICANT: Patturajan, Meera 
APPLICANT: Rothenberg, Mark 
APPLICANT: Shimkets, Richard 
APPLICANT: Smithson, Glennda 
APPLICANT: Spytek, Kimberly A. 
APPLICANT: Stone, David J. 
APPLICANT: Vernet, Corine A.M. 
APPLICANT: Zerhusen, Bryan D. 

TITLE OF INVENTION: PROTEINS, POLYNUCLEOTIDES ENCODING THEM AND METHODS OF 
TITLE OF INVENTION: USING THE SAME 
FILE REFERENCE: 21402-250 (CURA-550) 
CURRENT APPLICATION NUMBER: US/10/ 052 , 648A 
CURRENT FILING DATE: 2002-12-09 
PRIOR APPLICATION NUMBER: 60/262,454 
PRIOR FILING DATE: 2001-01-18 
PRIOR APPLICATION NUMBER: 60/272,920 
PRIOR FILING DATE: 2001-03-02 
PRIOR APPLICATION NUMBER: 60/284,549 
PRIOR FILING DATE: 2001-04-18 
PRIOR APPLICATION NUMBER: 60/303,229 
PRIOR FILING DATE: 2001-07-05 
PRIOR APPLICATION NUMBER: 60/262,8 92 
PRIOR FILING DATE: 2001-01-19 
PRIOR APPLICATION NUMBER: 60/263,605 
PRIOR FILING DATE: 2001-01-23 
PRIOR APPLICATION NUMBER: 60/269,098 
PRIOR FILING DATE: 2001-02-15 
PRIOR APPLICATION NUMBER: 60/264,159 
PRIOR FILING DATE: 2001-01-25 
PRIOR APPLICATION NUMBER: 60/265,517 
PRIOR FILING DATE: 2001-01-31 
PRIOR APPLICATION NUMBER: 60/271,855 
PRIOR FILING DATE: 2001-02-27 

Remaining Prior Application data removed - See File Wrapper or PALM. 
NUMBER OF SEQ ID NOS : 97 
SOFTWARE: PatentlnVer. 2.1 
SEQ ID NO 34 
LENGTH: 969 



TYPE: PRT 
; ORGANISM: Homo sapiens 
US-10-052-648A-34 



Query Match 58.2%; Score 2094; DB 15; Length 969; 

Best Local Similarity 65.4%; Pred. No. 1.2e-131; 

Matches 312; Conservative 58; Mismatches 107; Indels 0; Gaps 0 

109 CADKCVHGRCIAPNTCQCEPGWGGTNCSSACDGDHWGPHCTSRCQCKNGALCNPITGACH 168 

| : : | | | | | | : : I : I I I I I I I I I : I I I II I I I I I I I :: I I I I : I I I I I I 

2 8 CTEECVHGRCVSPDTCHCEPGWGGPDCSSGCDSDHWGPHCSNRCQCQNGALCNPITGACV 87 



QY 
Db 



Qy 169 CAAGFRGWRCEDRCEQGTYGNDCHQRCQCQNGATCDHVTGECRCPPGYTGAFCEDLCPPG 228 

I | I I M : I I I : I I I I I :: I I : I I I II : I I : I I I I I 

Db 8 8 CAAGFRGWRCEELCAPGTHGKGCQLPCQCRHGASCDPRAGECLCAPGYTGVYCEELCPPG 14 7 

Qy 22 9 KHGPQCEQRCPCQNGGVCHHVTGECSCPSGWMGTVCGQPCPEGRFGKNCSQECQCHNGGT 288 

|| M I I I I I I : I I I I : I I I I I I I I I I I I I I : I I I I : I I I : I I 

Db 14 8 SHGAHCELRCPCQNGGTCHHITGECACPPGWTGAVCAQPCPPGTFGQNCSQDCPCHHGGQ 2 07 

Q y 289 CDAATGQCHCSPGYTGERCQDECPVGTYGVLCAETCQCVNGGKCYHVSGACLCEAGFAGE 348 

|| | | M | | : I I I : I I I : I I I I : : I I : : I I I I I : I : I I I I I I : I 

Db 208 CDHVTGQCHCTAGYMGDRCQEECPFGSFGFQCSQRCDCHNGGQCSPTTGACECEPGYKGP 267 

Qy 349 RCEARLCPEGLYGIKCDKRCPCHLENTHSCHPMSGECACKPGWSGLYCNETCSPGFYGEA 408 

| | : | | | | | | I : I I III : I I I I I I I I I : I I I I I : I I I : I I : N : 
Db 2 68 RCQERLCPEGLHGPGCTLPCPCDADNTISCHPVTGACTCQPGWSGHHCNESCPVGYYGDG 327 

Qy 409 CQQICSCQNGADCDSVTGKCTCAPGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVCSPVD 468 

II I : I I I I I I I I : I I I II I I I I I I : I MM MM 

Db 328 CQLPCTCQNGADCHSITGGCTCAPGFMGEVCAVSCAAGTYGPNCSSICSCNNGGTCSPVD 387 

Qy 4 69 GSCTCKAGWHGVDCSIRCPSGTWGFGCNLTCQCLNGGACNTLDGTCTCAPGWRGEKCELP 52 8 

I I I I i I II | : I I : : II I II I I II M I II i I : M I : I M III I : I I I I 

Db 388 GSCTCKEGWQGLDCTLPCPSGTWGLNCNESCTCANGAACSPIDGSCSCTPGWLGDTCELP 447 

Qy 529 CQDGTYGLNCAERCDCSHADGCHPTTGHCRCLPGWSGVHCDSVCAEGRWGPNCSLPC 585 

| | | I : I I I I : I I II M I II I I I I I I II MM: | | | | | I I I I I II : I 

Db 44 8 CPDGTFGLNCSEHCDCSHADGCDPVTGHCCCLAGWTGIRCDSTCPPGRWGPNCSVSC 504 



RESULT 5 

US-10-052-648A-35 

Sequence 35, Application US/10052648A 
Publication No. US2 004 0005558A1 
GENERAL INFORMATION: 
APPLICANT: Anderson, David 
APPLICANT: Burgess, Catherine 
APPLICANT: Casman, Stacie 
APPLICANT: Colman, Steven 
APPLICANT: Edinger, Shlomit R. 
APPLICANT: Ellerman, Karen 
APPLICANT: Gerlach, Valerie 
APPLICANT: Gunther, Erik 
APPLICANT: Kekuda, Ramesh 
APPLICANT: MacDougall, John R. 
APPLICANT: Mehraban, Fuad 



APPLICANT : Patturajan, Meera 
APPLICANT: Rothenberg, Mark 
APPLICANT: Shimkets, Richard 
APPLICANT: Smithson, Glennda 
APPLICANT : Spytek, Kimberly A. 
APPLICANT: Stone, David J. 
APPLICANT: Vernet, Corine A.M. 
APPLICANT: Zerhusen, Bryan D. 

TITLE OF INVENTION: PROTEINS, POLYNUCLEOTIDES ENCODING THEM AND METHODS OF 
TITLE OF INVENTION: USING THE SAME 
FILE REFERENCE: 21402-250 (CURA-550) 
CURRENT APPLICATION NUMBER: US/ 1 0/ 0 52 , 64 8A 
CURRENT FILING DATE: 2002-12-09 
PRIOR APPLICATION NUMBER: 60/262,454 
PRIOR FILING DATE: 2001-01-18 
PRIOR APPLICATION NUMBER: 60/272,920 
PRIOR FILING DATE: 2001-03-02 
PRIOR APPLICATION NUMBER: 60/284,549 
PRIOR FILING DATE: 2001-04-18 
PRIOR APPLICATION NUMBER: 60/303,229 
PRIOR FILING DATE: 2001-07-05 
PRIOR APPLICATION NUMBER: 60/262,892 
PRIOR FILING DATE: 2001-01-19 
PRIOR APPLICATION NUMBER: 60/263,605 
PRIOR FILING DATE: 2001-01-23 
PRIOR APPLICATION NUMBER: 60/269,098 
PRIOR FILING DATE: 2001-02-15 
PRIOR APPLICATION NUMBER: 60/264,159 
PRIOR FILING DATE: 2001-01-25 
PRIOR APPLICATION NUMBER: 60/265,517 
PRIOR FILING DATE: 2001-01-31 
PRIOR APPLICATION NUMBER: 60/271,855 
PRIOR FILING DATE: 2001-02-27 

Remaining Prior Application data removed - See File Wrapper or PALM. 
NUMBER OF SEQ ID NOS : 97 
SOFTWARE: Patent In Ver . 2.1 
SEQ ID NO 35 
LENGTH: 969 
TYPE: PRT 

ORGANISM: Homo sapiens 
FEATURE : 

NAME/ KEY: VARIANT 
LOCATION: (848) . . (889) 

OTHER INFORMATION: Where Xaa is any amino acid 
US-10-052-648A-35 



Query Match 58.1%; Score 2093; DB 15 

Best Local Similarity 65.4%; Pred. No. 1.4e-131 
Matches 312; Conservative 58; Mismatches 107 



Length 969; 

Indels 0; Gaps 0; 



QY 



Db 



109 CADKCVHGRCIAPNTCQCEPGWGGTNCSSACDGDHWGPHCTSRCQCKNGALCNPITGACH 168 

| : : | | I I I I : : I : I I I I i I II I : I I I II I I I I I I I :: I I I I : I I I I I I M I I I I 

2 8 CTEECVHGRCVSPDTCHCEPGWGGPDCSSGCDSDHWGPHCSNRCQCQNGALCNPITGACV 8 7 



Qy 



Db 



169 CAAGFRGWRCEDRCEQGTYGNDCHQRCQCQNGATCDHVTGECRCPPGYTGAFCEDLCPPG 22 8 

| | | | | I | M II : I 11:1 I | | | : : I I : I I I I I I I I I I I : I I : I I I M 
8 8 CAAGFRGWRCEELCAPGTHGKGCQLPCQCRHGASCDPRAGECLCAPGYTGVYCEELCPPG 147 



Qy 


22 9 


Db 


1 A Q 

i 4 o 


Qy 


o o n 

2 8 y 


Db 




Qy 


349 


Db 


O £ Q 
ZOO 


Qy 


409 


Db 




Qy 


469 


Db 


388 


Qy 


529 


Db 


448 



288 



KHGPQCEQRCPCQNGGVCHHVTGECSCPSGWKGTVCGQPCPEGRFGKNCSQECQCHNGGT 

II I II 111:1111: M I I I I I M : I I M : I I I : I I 

SHGAHCELRCPCQNGGTCHHITGECACPPGWTGAVCAQPCPPGTFGQNCSQDCPCHHGGQ 207 



CDAATGQCHCSPGYTGERCQDECPVGTYGVLCAETCQCVNGGKCYHVSGACLCEAGFAGE 34 8 

I : I I I : I I I : I I I I : : I I : : = I : I • 

CDHVTGQCHCTAGYMGDRCQEECPFGSFGFQCSQHCDCHNGGQCSPTTGACECEPGYKGP 



267 



RCEARLCPEGLYGIKCDKRCPCHLENTHSCHPMSGECACKPGWSGLYCNETCSPGFYGEA 4 08 

| | : | | | | | | I : I I III : I I I I I I :: I I I : I I I I I : I I I : I I : I I : 

RCQERLCPEGLHGPGCTLPCPCDADNTISCHPVTGACTCQPGWSGHHCNESCPVGYYGDG 327 

CQQICSCQNGADCDSVTGKCTCAPGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVCSPVD 4 68 
|| I : I I I I I II I : I I I I I I : I I I I I I I I I I I I 



GSCTCKAGWHGVDCSIRCPSGTWGFGCNLTCQCLNGGACNTLDGTCTCAPGWRGEKCELP 52 8 

I I I M I II I : I I : : II I I I I I II : I I I I I I : : I I : I = I I I I I : I I I I 

GSCTCKEGWQGLDCTLPCPSGTWGLNCNESCTCANGAACSPIDGSCSCTPGWLGDTCELP 4 47 

CQDGTYGLNCAERCDCSHADGCHPTTGHCRCLPGWSGVHCDSVCAEGRWGPNCSLPC 58 5 

| | | | : || I I : I I I II I I I I I I I I I I M I I : I - I I I > MINIM: | 
CPDGT FGLNCSEHCDCSHADGCDPVTGHCCCLAGWTGIRCDSTCPPGRWGPNCSVSC 504 



RESULT 6 

US-10-052-648A-10 

Sequence 10, Application US/10052648A 
Publication No. US20040005558A1 
GENERAL INFORMATION: 
APPLICANT: Anderson, David 
APPLICANT: Burgess, Catherine 
APPLICANT: Gasman, Stacie 
APPLICANT: Colman, Steven 
APPLICANT: Edinger, Shlomit R. 
APPLICANT: Ellerman, Karen 
APPLICANT: Gerlach, Valerie 
APPLICANT: Gunther, Erik 
APPLICANT: Kekuda, Ramesh 
APPLICANT: MacDougall, John R. 
APPLICANT: Mehraban, Fuad 
APPLICANT: Patturajan, Meera 
APPLICANT: Rothenberg, Mark 
APPLICANT: Shimkets, Richard 
APPLICANT: Smithson, Glennda 
APPLICANT: Spytek, Kimberly A. 
APPLICANT: Stone, David J. 
APPLICANT: Vernet, Corine A.M. 
APPLICANT: Zerhusen, Bryan D. 

TITLE OF INVENTION: PROTEINS, POLYNUCLEOTIDES ENCODING THEM AND METHODS OF 
TITLE OF INVENTION: USING THE SAME 
FILE REFERENCE: 21402-250 (CURA-550) 
CURRENT APPLICATION NUMBER: US/10/ 052, 648A 
CURRENT FILING DATE: 2002-12-09 
PRIOR APPLICATION NUMBER: 60/262,454 
PRIOR FILING DATE: 2001-01-18 



PRIOR APPLICATION NUMBER: 60/272,920 
PRIOR FILING DATE: 2001-03-02 
PRIOR APPLICATION NUMBER: 60/284,549 
PRIOR FILING DATE: 2001-04-18 
PRIOR APPLICATION NUMBER: 60/303,229 
PRIOR FILING DATE: 2001-07-05 
PRIOR APPLICATION NUMBER: 60/262,892 
PRIOR FILING DATE: 2001-01-19 
PRIOR APPLICATION NUMBER: 60/263,605 
PRIOR FILING DATE: 2001-01-23 
PRIOR APPLICATION NUMBER: 60/269,098 
PRIOR FILING DATE: 2001-02-15 
PRIOR APPLICATION NUMBER: 60/264,159 
PRIOR FILING DATE: 2001-01-25 
PRIOR APPLICATION NUMBER: 60/265,517 
PRIOR FILING DATE: 2001-01-31 
PRIOR APPLICATION NUMBER: 60/271,855 
PRIOR FILING DATE: 2001-02-27 

Remaining Prior Application data removed - See File Wrapper or PALM. 
NUMBER OF SEQ ID NOS : 97 
SOFTWARE: Patentln Ver. 2.1 
SEQ ID NO 10 
LENGTH: 1037 
TYPE: PRT 

ORGANISM: Homo sapiens 
US-10-052-648A-10 

Query Match 51.4%; Score 1851; DB 15; Length 1037; 

Best Local Similarity 52.0%; Pred. No. 2e-115; 

Matches 299; Conservative 61; Mismatches 209; Indels 6; Gaps 3; 

Qy 14 LLLCHWIGTASPLNLEDPNVCSHWESYSVTVQESYPHPFDQIYYTSCTDILNW-— FKCT 7 0 

I I i : I II I I I I I I I I : : I : I I : I I • I ' ' 

Db 9 LLLAVGLRLAGTLNPSDPNTCSFWESFTTTTKESHSRPFSLLPSEPCE— RPWEGPHTCP 66 

Qy 71 RHRVSYRTAYRHGEKTMYRRKSQCCPGFYESGEMCVPHCADKCVHGRCIAPNTCQCEPGW 130 

: | I I : I :: I I I I I I I I I I I I I I : I : I I II 

Db 67 QPTWYRTVYRQWKTDHRQRLQCCHGFYESREFCVPLCAQECVHGRCVAPNQCQCVPGW 12 6 

Q y 131 GGTNCSSACDGDHWGPHCTSRCQCKNGALCNPITGACHCAAGFRGWRCEDRCEQGTYGND 190 

| : | I I I III I I I I : I : I : I I I : I : I I Mi 

Db 127 RGDDCSSECAPGMWGPQCDKPCSCGNNSSCDPKSGVCSCPSGLQPPNCLQPCTPGYYGPA 186 

Qy 191 CHQRCQCQNGATCDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPCQNGGVCHHVT 250 

| 1111:1111 I I I I I M I : I I I I I I I I I I I 

Db 187 CQFRCQC-HGAPCDPQTGACFCPAERTGPSCDVSCSQGTSGFFCPSTHPCQNGGVFQTPQ 245 

Qy 251 GECSCPSGWMGTVCGQPCPEGRFGKNCSQECQCHNGGTCDAATGQCHCSPGYTGERCQDE 310 

| | | i | | I I I I : I I II I I 1111111:111 I I I I : I I I I I : I I I 

Db 246 GSCSCPPGWMGTICSLPCPEGFHGPNCSQECRCHNGGLCDRFTGQCRCAPGYTGDRCREE 305 

Qy 311 CPVGTYGVLCAETCQCVNGGKCYHVSGACLCEAGFAGERCEARLCPEGLYGIKCDKRCPC 370 

MM: II : I : : II I I I I M I : I I I I M : I I I : I I I 

Db 306 CPVGRFGQDCAETCDCAPDARCFPANGACLCEHGFTGDRCTDRLCPDGFYGLSCQAPCTC 365 

Qy 371 HLENTHSCHPMSGECACKPGWSGLYCNETCSPGFYGEACQQICSCQNGADCDSVTGKCTC 430 

| : : I II I I : I I I : I I I I : I I : I II : I : I I I : I I : I I : : I I I 



Db 


366 


QY 


431 


Db 


426 


Qy 


491 


Db 


486 


QY 


551 


Db 


546 



DREHSLSCHPMNGECSCLPGWAGLHCNESCPQDTHGPGCQEYCLCLHGGVCQATSGLCQC 425 



APGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVCSPVDGSCTCKAGWHGVDCSIRCPSGT 
| | | : I | : : I I I I I : I I I : I I I : I I I I : I I I I i I I : I I : I I I I 



490 



42 6 APGYTGPHCASLCPPDTYGVNCSARCSCENAIACSPIDGECVCKEGWQRGNCSVPCPPGT 4 85 

WGFGCNLTCQCLNGGACNTLDGTCTCAPGWRGEKCELPCQDGTYGLNCAERCDCSHADGC 55 0 

| | | | | : | M : I : I I I I I I I I hill I ■ I I I I I I I h I I I 

WGFSCNASCQCAHEAVCSPQTGACTCTPGWHGAHCQLPCPKGQFGEGCAS RCDCDHSDGC 54 5 



HPTTGHCRCLPGWSGVHCDSVCAEGRWGPNCSLPC 

I I hi III I I II I I I I I I 
DPVHGRCQCQAGWMGARCHLSCPEGLWGVNCSNTC 



585 



580 



RESULT 7 

US-10-052-648A-8 

Sequence 8, Application US/10052648A 
Publication No. US2 004 000555 8A1 
GENERAL INFORMATION: 
APPLICANT: Anderson, David 
APPLICANT: Burgess, Catherine 
APPLICANT: Gasman, Stacie 
APPLICANT: Colrnan, Steven 
APPLICANT: Edinger, Shlomit R. 
APPLICANT: Ellerman, Karen 
APPLICANT: Gerlach, Valerie 
APPLICANT: Gunther, Erik 
APPLICANT: Kekuda, Ramesh 
APPLICANT: MacDougall, John R. 
APPLICANT: Mehraban, Fuad 
APPLICANT: Patturajan, Meera 
APPLICANT: Rothenberg, Mark 
APPLICANT: Shimkets, Richard 
APPLICANT : Smithson, Glennda 
APPLICANT : Spytek, Kimberly A. 
APPLICANT: Stone, David J. 
APPLICANT: Vernet, Corine A.M. 
APPLICANT: Zerhusen, Bryan D. 

TITLE OF INVENTION: PROTEINS, POLYNUCLEOTIDES ENCODING THEM AND METHODS OF 
TITLE OF INVENTION: USING THE SAME 
FILE REFERENCE: 21402-250 (CURA-550) 
CURRENT APPLICATION NUMBER: US/ 10/ 052, 6 4 8A 
CURRENT FILING DATE: 2002-12-09 
PRIOR APPLICATION NUMBER: 60/262,454 
PRIOR FILING DATE: 2001-01-18 
PRIOR APPLICATION NUMBER: 60/272,920 
PRIOR FILING DATE: 2001-03-02 
PRIOR APPLICATION NUMBER: 60/284,549 
PRIOR FILING DATE: 2001-04-18 
PRIOR APPLICATION NUMBER: 60/303,229 
PRIOR FILING DATE: 2001-07-05 
PRIOR APPLICATION NUMBER: 60/262,892 
PRIOR FILING DATE: 2001-01-19 
PRIOR APPLICATION NUMBER: 60/263,605 
PRIOR FILING DATE: 2001-01-23 
PRIOR APPLICATION NUMBER: 60/269,098 



PRIOR FILING DATE: 2001-02-15 
PRIOR APPLICATION NUMBER: 60/264,159 
PRIOR FILING DATE: 2001-01-25 
PRIOR APPLICATION NUMBER: 60/265,517 
PRIOR FILING DATE: 2001-01-31 
PRIOR APPLICATION NUMBER: 60/271,855 
PRIOR FILING DATE: 2001-02-27 

Remaining Prior Application data removed - See File Wrapper or PALM. 
NUMBER OF SEQ ID NOS : 97 
SOFTWARE: PatentlnVer. 2.1 
SEQ ID NO 8 
LENGTH: 1037 
TYPE: PRT 

ORGANISM: Homo sapiens 
US-10-052-648A-8 

Query Match 50.8%; Score 1830; DB 15; Length 1037; 

Best Local Similarity 51.7%; Pred. No. 5e-114; 

Matches 297; Conservative 61; Mismatches 211; Indels 6; Gaps 3; 

Qy 14 LLLCHWIGTASPLNLEDPNVCSHWES YSVTVQESYPHPFDQIYYTSCTDILNW FKCT 7 0 

Ml : I II I I I I I I I I : : I : I I : I I = I I I 

Db 9 LLLAVGLRLAGTLNPSDPNTCSFWESFTTTTKESHSRPFSLLPSEPCE RPWEGPHTCP 66 

Q y 71 RHRVSYRTAYRHGEKTMYRRKSQCCPGFYESGEMCVPHCADKCVHGRCIAPNTCQCEPGW 13 0 

: | | | | I I II : I : : III Mill III I I : I I I I I I : I I I I I I I I I 
Db 67 QPTWYRTVYRQWKTDHRQRLQCCHGFYESRGFCVPLCAQECVHGRCVAPNQCQCVPGW 12 6 

Qy 131 GGTNCSSACDGDHWGPHCTSRCQCKNGALCNPITGACHCAAGFRGWRCEDRCEQGTYGND 190 

I : | I I I I I I I | | | : | : | : I I I : I : I I I I I 

Db 127 RGDDCSSECAPGMWGPQCDKPCSCGNNSSCDPKSGVCSCPSGLQPPNCLQPCTPGYYGPA 18 6 

Qy 191 CHQRCQCQNGATCDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPCQNGGVCHHVT 250 

I 1111:1111 I I I II M I : I I i I I I I I I I I 

Db 187 CQFRCQC-HGAPCDPQTGACFCPAERTGPSCDVSCSQGTSGFFCPSTHPCQNGGVFQTPQ 245 

Qy 251 GECSCPSGWMGTVCGQPCPEGRFGKNCSQECQCHNGGTCDAATGQCHCSPGYTGERCQDE 310 

| | | | | |||||:| I I I I I I I I I I I I : II I I I II I I I I I : I I I I I : I I : : I 

Db 246 GSCSCPPGWMGTICSLPCPEGFHGPNCSQECRCHNGGLCDRFTGQCRCAPGYTGDRCREE 305 

Qy 311 CPVGTYGVLCAETCQCVNGGKCYHVSGACLCEAGFAGERCEARLCPEGLYGIKCDKRCPC 370 

| | | | : | | 1 I I I I : I : : I I M I I II I : I I I I I I : I I I : I I 

Db 306 CPVGRFGQDCAETCDCAP DARCFPANGACLCEHGFTGDRCTDRLCPDGFYGLSCQAPRTC 3 65 

Qy 371 HLENTHSCHPMSGECACKPGWSGLYCNETCSPGFYGEACQQICSCQNGADCDSVTGKCTC 430 

I : : I I I I I : II I : I I I I : I I : I I I : I : I I I : I I : I I : : I I I 

Db 366 DREHSLSCHPMNGECSCLPGWAGLHCNESCPQDTHGPGCQEHCLCLHGGVCQATSGLCQC 425 

Qy 431 APGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVCSPVDGSCTCKAGWHGVDCSIRCPSGT 490 

|||:| I : : I I I I I : I I I : I I I : I Ml: I : I I : I I I I 

Db 42 6 APGYTGPHCASLCPPDTYGVNCSARCSCENAIACSPIDGECVCKEGWQRGNCSVPCPPGT 4 85 

Qy 4 91 WGFGCNLTCQCLNGGACNTLDGTCTCAPGWRGEKCELPCQDGTYGLNCAERCDCSHADGC 550 

| | | I I : I I I : i : I I I I I I I I Mill I : I I I I I I I M I I I 
Db 486 WGFSCNASCQCAHEAVCSPQTGACTCTPGWHGAHCQLPCPKGQFGEGCASRCDCDHSDGC 54 5 



Qy 



551 HPTTGHCRCLPGWSGVHCDSVCAEGRWGPNCSLPC 585 



Db 



I I I : I III I I I I I I I I I I 
546 DPVHGRCQCQAGWMGARCHLSCPEGLWGVNCSNTC 580 



RESULT 8 

US-10-052-648A-31 

Sequence 31, Application US/10052648A 
Publication No. US2 004 0 005558A1 
GENERAL INFORMATION: 
APPLICANT: Anderson, David 
APPLICANT: Burgess, Catherine 
APPLICANT: Casrnan, Stacie 
APPLICANT: Colman, Steven 
APPLICANT: Edinger, Shlomit R. 
APPLICANT: Elierman, Karen 
APPLICANT: Gerlach, Valerie 
APPLICANT: Gunther, Erik 
APPLICANT: Kekuda, Ramesh 
APPLICANT: MacDougall, John R. 
APPLICANT: Mehraban, Fuad 
APPLICANT: Patturajan, Meera 
APPLICANT: Rothenberg, Mark 
APPLICANT: Shimkets, Richard 
APPLICANT: Smithson, Glennda 
APPLICANT: Spytek, Kimberly A. 
APPLICANT: Stone, David J. 
APPLICANT: Vernet, Corine A.M. 
APPLICANT: Zerhusen, Bryan D. 

TITLE OF INVENTION: PROTEINS, POLYNUCLEOTIDES ENCODING THEM AND METHODS OF 
TITLE OF INVENTION: USING THE SAME 
FILE REFERENCE: 21402-250 (CURA-550) 
CURRENT APPLICATION NUMBER: US/ 10/ 052, 64 8 A 
CURRENT FILING DATE: 2002-12-09 
PRIOR APPLICATION NUMBER: 60/262,454 
PRIOR FILING DATE: 2001-01-18 
PRIOR APPLICATION NUMBER: 60/272,920 
PRIOR FILING DATE: 2001-03-02 
PRIOR APPLICATION NUMBER: 60/284,549 
PRIOR FILING DATE: 2001-04-18 
PRIOR APPLICATION NUMBER: 60/303,229 
PRIOR FILING DATE: 2001-07-05 
PRIOR APPLICATION NUMBER: 60/262,892 
PRIOR FILING DATE: 2001-01-19 
PRIOR APPLICATION NUMBER: 60/263,605 
PRIOR FILING DATE: 2001-01-23 
PRIOR APPLICATION NUMBER: 60/269,098 
PRIOR FILING DATE: 2001-02-15 
PRIOR APPLICATION NUMBER: 60/264,159 
PRIOR FILING DATE: 2001-01-25 
PRIOR APPLICATION NUMBER: 60/265,517 
PRIOR FILING DATE: 2001-01-31 
PRIOR APPLICATION NUMBER: 60/271,855 
PRIOR FILING DATE: 2001-02-27 

Remaining Prior Application data removed - See File Wrapper or PALM. 
NUMBER OF SEQ ID NOS : 97 
SOFTWARE: Patentln Ver . 2.1 
SEQ ID NO 31 



-i 



LENGTH: 1034 
TYPE: PRT 

ORGANISM: Mus musculus 
US-10-052-648A-31 

Query Match 50.83; Score 1828; DB 15; Length 1034; 

Best Local Similarity 52.0%; Pred. No. 6.8e-114; 

Matches 299; Conservative 59; Mismatches 211; Indels 6; Gaps 3; 

Qy 14 LLLCHWIGTASPLNLEDPNVCSHWES YSVTVQESYPHPFDQIYYTSCTDILNW FKCT 70 

III : I I I I I I I : I I I : : I : II = I ! : II I I 

Db 7 LLLALGLRLTGTLNSNDPNVCTFWESFTTTTKESHLRPFSLLPAESCH — RPWEDPHTCA 64 

Qy 71 RHRVSYRTAYRHGEKTMYRRKSQCCPGFYESGEMCVPHCADKCVHGRCIAPNTCQCEPGW 130 

: Mill I | : | | | | : | | | I I I I I : I I I I I I : I ! I M I I I I 

Db 65 QPTWYRTVYRQWKMDSRPRLQCCRGYYESRGACVPLCAQECVHGRCVAPNQCQCAPGW 12 4 

Qy 131 GGTNCSSACDGDHWGPHCTSRCQCKNGALCNPITGACHCAAGFRGWRCEDRCEQGTYGND 190 

I : I I I I I I I I I I I : I : I : I ! I I : I : I I I I I 

Db 125 RGGDCSSECAPGMWGPQCDKFCHCGNNSSCDPKSGACFCPSGLQPPNCLQPCPAGHYGPA 184 

Qy 191 CHQRCQCQNGATCDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPCQNGGVCHHVT 250 

I III I I : I I I I I I I I I I I I I I : I I I I I I I 

Db 185 CQFDCQCY-GASCDPQDGACFCPPGRAGPSCNVPCSQGTDGFFCPRTYPCQNGGVPQGSQ 243 

Qy 251 GECSCPSGWMGTVCGQPCPEGRFGKNCSQECQCHNGGTCDAATGQCHCSPGYTGERCQDE 310 

111111111:1 I I I I I I I I : I I I : I I I I I I I I I I i I I : I I I I : I I I : I 

Db 244 GSCSCPPGWMGVICSLPCPEGFHGPNCTQECRCHNGGLCDRFTGQCHCAPGYIGDRCQEE 303 

Qy 311 CPVGTYGVLCAETCQCVNGGKCYHVSGACLCEAGFAGERCEARLCPEGLYGIKCDKRCPC 370 

I I I I : I Mill I I : I : : I I I I I I I I I : I I I I I I : I I I : I : I I 
Db 304 CPVGRFGQDCAETCDCAPGARCFPANGACLCEHGFTGDRCTERLCPDGRYGLSCQEPCTC 363 

Qy 371 HLENTHSCHPMSGECACKPGWSGLYCNETCSPGFYGEACQQICSCQNGADCDSVTGKCTC 430 

I : : I I I I I I I I : I : II I : M : I I I : I : I I I : I I : I I : : I I I 

Db 364 DPEHSLSCHPMHGECSCQPGWAGLHCNESCPQDTHGPGCQEHCLCLHGGLCLADSGLCRC 423 

Qy 431 APGFKGIDCSTPCPLGTYG1NCSSRCGCKNDAVCSPVDGSCTCKAGWHGVDCS I RCPSGT 4 90 

I I I : I I : II 11111111111:1 I I I : I I : I I I I I : I I : I I I I 
Db 424 APGYTGPHCANLCPPDTYGINCSSRCSCENAIACSPIDGTCICKEGWQRGNCSVPCPLGT 483 

Qy 491 WGFGCNLTCQCLNGGACNTLDGTCTCAP GWRGEKCELPCQDGTYGLNCAERCDCSHADGC 550 

I I I I I : I I I : I I : I I I I I I I I I : I I I I : I II I I I I : I I I 

Db 484 WGFNCNASCQCAHDGVCSPQTGACTCTPGWHGAHCQLPCPKGQFGEGCASVCDCDHSDGC 543 

Qy 551 HPTTGHCRCLPGWSGVHCDSVCAEGRWGPNCSLPC 585 

I I I I I III I I I I I I I I I I 
Db 54 4 DPVHGQCRCQAGWMGTRCHLPCPEGFWGANCSNTC 57 8 



RESULT 9 

US-10-052-648A-32 

; Sequence 32, Application US/10052648A 
; Publication No. US2 0 0 4 0 0 05 5 5 8A1 
; GENERAL INFORMATION: 

APPLICANT: Anderson, David 
; APPLICANT: Burgess, Catherine 



APPLICANT: 


Casman, 


Stacie 


APPLICANT: 


Colman, 


Steven 


APPLICANT: 


Edinger, 


Shlomit R. 


APPLICANT: 


Ellerman 


, Karen 


APPLICANT: 


Gerlach, 


Valerie 


APPLICANT: 


Gunther , 


Erik 


APPLICANT: 


Kekuda , 


Ramesh 


APPLICANT: 


MacDouga 


11, John R. 


APPLICANT: 


Mehraban 


, Fuad 


APPLICANT: 


Pattura j 


an, Meera 


APPLICANT: 


Rothenbe 


rg, Mark 


APPLICANT: 


Shimkets 


, Richard 


APPLICANT: 


Smithson 


, Glennda 


APPLICANT: 


Spytek, 


Kimberly A. 


APPLICANT: 


Stone, D 


avid J. 


APPLICANT: 


Vernet , 


Corine A.M. 


APPLICANT: 


Zerhusen 


, Bryan D. 



; TITLE OF INVENTION: PROTEINS, POLYNUCLEOTIDES ENCODING THEM AND METHODS OF 

TITLE OF INVENTION: USING THE SAME 
; FILE REFERENCE: 21402-250 (CURA-550) 
; CURRENT APPLICATION NUMBER: US/ 10/ 052 , 64 8A 
; CURRENT FILING DATE: 2002-12-09 
; PRIOR APPLICATION NUMBER: 60/262,454 
; PRIOR FILING DATE: 2001-01-18 
; PRIOR APPLICATION NUMBER: 60/272,920 
; PRIOR FILING DATE: 2001-03-02 

PRIOR APPLICATION NUMBER: 60/284,549 
; PRIOR FILING DATE: 2001-04-18 
; PRIOR APPLICATION NUMBER: 60/303,229 

PRIOR FILING DATE: 2001-07-05 
; PRIOR APPLICATION NUMBER: 60/262,892 
; PRIOR FILING DATE: 2001-01-19 
; PRIOR APPLICATION NUMBER: 60/263,605 
; PRIOR FILING DATE: 2001-01-23 
; PRIOR APPLICATION NUMBER: 60/269,098 
; PRIOR FILING DATE: 2001-02-15 
; PRIOR APPLICATION NUMBER: 60/264,159 
; PRIOR FILING DATE: 2001-01-25 
; PRIOR APPLICATION NUMBER: 60/265,517 
; PRIOR FILING DATE: 2001-01-31 
; PRIOR APPLICATION NUMBER: 60/271,855 
/ PRIOR FILING DATE: 2001-02-27 

Remaining Prior Application data removed - See File Wrapper or PALM. 
; NUMBER OF SEQ ID NOS : 97 
; SOFTWARE: Patentln Ver. 2.1 
; SEQ ID NO 32 

LENGTH: 10 34 
; TYPE: PRT 

; ORGANISM: Mus mus cuius 
US-10-052-648A-32 

Query Match 50.7%; Score 1824; DB 15; Length 1034; 

Best Local Similarity 51.8%; Pred. No. 1.3e-113; 

Matches 298; Conservative 59; Mismatches 212; Indels 6; Gaps 3 
Qy 14 LLLCHWIGTASPLNLEDPNVCSHWESYSVTVQES YPHPFDQIYYTSCTDILNW FKCT 7 0 



Db 


7 


LLLALGLRLTGTLNSNDPNVCTFWESFTTTTKESHLRPFSLLPAESCH--RPWEDPHTCA 


64 


Qy 


71 


RHRVSYRTAYRHGEKTMYRRKSQCCPGFYESGEMCVPHCADKCVHGRCIAPNTCQCEPGW 

1 III II 1 1 * IN 1 * 1 1 1 III II : 1 1 1 1 1 1 : 1 1 1 III III 
i iii ii i i • lit i ■ i ii iii ii • i i i i i i * i i i iii iii 

QPTWYRTVYRQWKMDSRPRLQCCRGYYESRGACVPLCAQECVHGRCVAPNQCQCAPGW 


130 


Db 


65 


124 


Qy 


131 


GGTNCSSACDGDHWGPHCTSRCQCKNGALCNPITGACHCAAGFRGWRCEDRCEQGTYGND 

1 * 1 1 1 1 1 1 1 1 1 1 1 : 1 : 1 : 1 1 i : 1 : 1 1 ill 
i*iiii iiii iii»i*i*iii'i* i i iii 

RGGDCSSECAPGMWGPQCDKFCHCGNNSSCDPKSGTCFCPSGLQPPNCLQPCPAGHYGPA 


190 


Db 


125 


184 


Qy 


191 


CHQRCQCQNGATCDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPCQNGGVCHHVT 

1 III ll'll 1 1 1 1 1 1 1 1 I 1 1 1 : 1 1 1 I 1 1 1 
i iii i i • i i i i i 1 i i i i i i i i • » i i < i i i 

CQFDCQCY-GASCDPQDGACFCPPGRAGPSCNVPCSQGTDGFFCPRTYPCQNGGVPQGSQ 


250 


Db 


185 


243 


Qy 


251 


GECSCPSGWMGTVCGQPCPEGRFGKNCSQECQCHNGGTCDAATGQCHCSPGYTGERCQDE 
1 Mil MM • 1 Mill 1 M : 1 1 1 : 1 1 1 1 1 II 1 1 II 1 M 1 II M 1 1 1 : 1 

I Mil IIII • 1 lllll 1 | | • l I l • l l l l l ii iiiiii iii i iii i 

GSCSCPPGWMGVICSLPCPEGFHGPNCTQECRCHNGGLCDRFTGQCHCAPGYIGDRCQEE 


310 


Db 


244 


303 


Qy 


311 


CPVGTYGVLCAETCQCVNGGKCYHVSGACLCEAGFAGERCEARLCPEGLYGIKCDKRCPC 
IIII • 1 Mill 1 1 * 1 • -111111 || |:|| | M 1: 1 1 1 : 1 : 1 1 

[ l | I • I 1 1 1 ! 1 1 1 * 1 • • 1 1 1 1 1 1 II l»ll 1 1 1 1 • I I J ■ i i i 

CPVGRFGQDCAETCDCAPGARCFPANGACLCEHGFTGDRCTERLCPDGRYGLSCQEPCTC 


370 


Db 


304 


363 


Qy 


371 


HLENTHSCHPMSGECACKPGWSGLYCNETCSPGFYGEACQQICSCQNGADCDSVTGKCTC 
1 * • MM! 1 1 1 ■ M M M 1 M 1 1 M 1 M II: 1 1 : 1 1 : : 1 1 i 

I • « Mil! 1 1 1 • 1 • 1 1 1 • 1 1 • 1 1 1 • 1 *l 1 J • l i • i i * • i i i 

DPEHSLSCHPMHGECSCQPGWAGLHCNESCPQDTHGPGCQEHCLCLHGGLCLADSGLCRC 


430 


Db 


364 


423 


Qy 


431 


APGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVCSPVDGSCTCKAGWHGVDCSIRCPSGT 

MM 1 1 : II 1 1 i 1 1 M 1 1 1 M| 1 1 | : 1 I : 1 || || MM II II 
iii* i i * ii iiiiiiiiii i*i i i i ■ i \ * i ii ii i • ii 11 

APGYTGPHCANLCPPDTYGINCSSRCSCENAIACSPIDGTCICKEGWQRGNCSVPCPLGT 


490 


Db 


424 


483 


Qy 


491 


WGFGCNLTCQCLNGGACNTLDGTCTCAPGWRGEKCELPCQDGTYGLNCAERCDC5HADGC 

Ml II MM ■ 1 M 1 II 1 1 1 1 1 hill 1:1 11 II 1 1 : 1 1 1 
WGFNCNASCQCAHDGVCSPQTGACTCTPGWHGAHCQLPCPKGQFGEGCASVCDCDHSDGC 


550 


Db 


484 


543 


Qy 


551 


HPTTGHCRCLPGWSGVHCDSVCAEGRWGPNCSLPC 5 85 

1 1 1 II III 1 1 1 1 1 1 1 M 1 
DPVHGQCRCQAGWMGTRCHLPCPEGFWGANCSNTC 57 8 




Db 


544 





RESULT 10 
US-10-052-648A-2 

Sequence 2, Application US/10052648A 
Publication No. US20040005558A1 
GENERAL INFORMATION: 



APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 



Anderson, David 
Burgess, Catherine 
Casman, Stacie 
Colrnan, Steven 
Edinger, Shlomit R. 
Ellerman, Karen 
Gerlach, Valerie 
Gunther, Erik 
Kekuda, Ramesh 
MacDougall, John R. 
Mehraban, Fuad 
Patturajan, Meera 
Rothenberg, Mark 



APPLICANT: Shimkets, Richard 
APPLICANT: Smithson, Glennda 
APPLICANT: Spytek, Kimberly A. 
APPLICANT: Stone, David J. 
APPLICANT: Vernet, Corine A.M. 
APPLICANT: Zerhusen, Bryan D. 

TITLE OF INVENTION: PROTEINS, POLYNUCLEOTIDES ENCODING THEM AND METHODS OF 
TITLE OF INVENTION: USING THE SAME 
FILE REFERENCE: 21402-250 (CURA-550) 
CURRENT APPLICATION NUMBER: US/ 10/ 052 , 64 8A 
CURRENT FILING DATE: 2002-12-09 
PRIOR APPLICATION NUMBER: 60/262,454 
PRIOR FILING DATE: 2001-01-18 
PRIOR APPLICATION NUMBER: 60/272,920 
PRIOR FILING DATE: 2001-03-02 
PRIOR APPLICATION NUMBER: 60/284,549 
PRIOR FILING DATE: 2001-04-18 
PRIOR APPLICATION NUMBER: 60/303,229 
PRIOR FILING DATE: 2001-07-05 
PRIOR APPLICATION NUMBER: 60/262,892 
PRIOR FILING DATE: 2001-01-19 
PRIOR APPLICATION NUMBER: 60/263,605 
PRIOR FILING DATE: 2001-01-23 
PRIOR APPLICATION NUMBER: 60/269,098 
PRIOR FILING DATE: 2001-02-15 
PRIOR APPLICATION NUMBER: 60/264,159 
PRIOR FILING DATE: 2001-01-25 
PRIOR APPLICATION NUMBER: 60/265,517 
PRIOR FILING DATE: 2001-01-31 
PRIOR APPLICATION NUMBER: 60/271,855 
PRIOR FILING DATE: 2001-02-27 

Remaining Prior Application data removed - See File Wrapper or PALM. 
NUMBER OF SEQ ID NOS : 97 
SOFTWARE: PatentlnVer. 2.1 
SEQ ID NO 2 
LENGTH: 102 0 
TYPE: PRT 

ORGANISM: Homo sapiens 
US-10-052-648A-2 

Query Match 50.3%; Score 1811; DB 15; Length 1020; 

Best Local Similarity 49.4%; Pred. No. 9.1e-113; 

Matches 297; Conservative 61; Mismatches 211; Indels 32; Gaps 4 

Qy 14 LLLCHWIGTASPLNLEDPNVCSHWESYSVTVQESYPHPFDQIYYTSCTDILNW FKCT 7 0 

III : I II | | | M | | | : : | : I I : II : I I I 

Db 9 LLLAVGLRLAGTLNPSDPNTCSFWESFTTTTKESHSRPFSLLPSEPCE — RPWEGPHTCP 66 

Qy 71 RHRVSYRTAYRHGEKTMYRRKSQCCPGFYESGEMCVPHCADKCVHGRCIAPNTCQCEPGW 130 

: I I I I I I I I : I : : I I I I I I I i I I I I I : I I I I I I : I I I I I I I I I 

Db 67 QPTWYRTVYRQWKTDHRQRLQCCHGFYESRGFCVPLCAQECVHGRCVAPNQCQCVPGW 12 6 

Qy 131 GGTNCSSACDGDHWGPHCTSRCQCKNGALCNPITGACHCAAGFRGWRCEDRCEQGTYGND 190 

I : ! I I I I I I I | | | : | : | : I I i : I : I I III 

Db 127 RGDDCSSECAPGMWGPQCDKPCSCGNNSSCDPKSGVCSCPSGLQPPNCLQPCTPGYYGPA 186 



Qy 



191 CHQRCQCQNGATCDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPCQNGGVCHHVT 2 50 



Db 


187 


CQFRCQC-HGAPCDPQTGACFCPAERTGPSCDVSCSQGTSGFFCPSTHSCQNGGVFQTPQ 


245 


Qv 


251 


GECSCPSGWM GTVCGQPCPEGRFGKNCSQECQCH 

| | I I I I I I 1 1 : 1 I 1 1 1 1 1 1 1 1 1 1 1 : 1 1 

GSCSCPPGWMVWRVGPVGMGCGSGENSVGGAKQGSKGTICSLPCPEGFHGPNCSQECRCH 


284 


Db 


246 


305 


Qv 


285 


NGGTCDAATGQCHCSPGYTGERCQDECPVGTYGVLCAETCQCVNGGKCYHVSGACLCEAG 

11 i 1 1 : 1 1 1 1 1 : 1 1 : : 1 1 1 1 1 : 1 1 1 1 1 1 1 : 1 : : 1 1 1 1 1 1 1 

NGGLCDRFTGQCRCAPGYTGDRCREECPVGRFGQDCAETCDCAPDARCFPANGACLCEHG 


344 


Db 


306 


365 


Qv 


345 


FAGERCEARLCPEGLYGIKCDKRCPCHLENTHSCHPMSGECACKPGWSGLYCNETCSPGF 
1 1 : 1 1 1 1 1 1 : 1 1 1 : 1 II : : 1 : 1 : : 1 : 1 : 
FTGDRCTDRLCPDGFYGLSCQAPCTCDREHSLSCHPMNGECSCLPGWAGLHCNESCPQDT 


404 


Db 


366 


425 


Ov 


405 


YGEACQQICSCQNGADCDSVTGKCTCAPGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVC 

: 1 1 1 : 1 1 : 1 1 : : 1 1 1 1 1 1 : 1 1 : : 1 1 1 I 1 : 1 i 1 : 1 1 1 : 1 1 
HGPGCQEHCLCLHGGVCQATSGLCQCAPGYTGPHCASLCPPDTYGVNCSARCSCENAIAC 


464 


Db 


426 


485 


Qv 


465 


SPVDGSCTCKAGWHGVDCSIRCPSGTWGFGCNLTCQCLNGGACNTLDGTCTCAPGWRGEK 

1 1 : 1 1 ! 1 1 1 1 : 1 1 : 1 1 1 1 1 1 1 1 1 : 1 II : 1 : 1 1 1 1 1 1 1 1 

SPIDGECVCKEGWQRGNCSVPCPPGTWGFSCNASCQCAHEAVCSPQTGACTCTPGWHGAH 


524 


Db 


486 


545 


Qy 


525 


CELPCQDGTYGLNCAERCDCSHADGCHPTTGHCRCLPGWSGVHCDSVCAEGRWGPNCSLP 

1 : 1 1 1 1 : 1 1 1 1 1 ! 1 1 : 1 1 1 1 1 hi Ml 1 1 1 1 1 1 1 1 1 
CQLPCPKGQFGEGCASRCDCDHSDGCDPVHGRCQCQAGWMGARCHLSCPEGLWGVNCSNT 


584 


Db 


546 


605 


Qy 


585 


C 585 
1 




Db 


606 


C 606 





RESULT 11 
US-10-052-648A-4 

; Sequence 4, Application US/10052648A 
; Publication No. US2 0 04 0 0 055 58A1 
; GENERAL INFORMATION: 



APPLICANT 


Anderson, 


David 


APPLICANT 


Burgess , 


Catherine 


APPLICANT 


Casman, 


Stacie 


APPLICANT 


Colman, 


Steven 


APPLICANT 


Edinger , 


Shlomit R. 


APPLICANT 


Ellerman 


, Karen 


APPLICANT 


Gerlach, 


Valerie 


APPLICANT 


Gunther , 


Erik 


APPLICANT 


Kekuda, 


Ramesh 


APPLICANT 


MacDouga 


11, John R. 


APPLICANT 


Mehraban 


, Fuad 


APPLICANT 


Pattura j 


an, Meera 


APPLICANT 


Rothenbe 


rg, Mark 


APPLICANT 


Shimkets 


, Richard 


APPLICANT 


Smithson 


, Glennda 


APPLICANT 


: Spytek, 


Kimberly A. 


APPLICANT 


: Stone, D 


avid J. 


APPLICANT 


: Vernet, 


Corine A.M. 


APPLICANT 


: Zerhusen 


, Bryan D. 



TITLE OF INVENTION: PROTEINS, POLYNUCLEOTIDES ENCODING THEM AND METHODS OF 



TITLE OF INVENTION: USING THE SAME 
FILE REFERENCE: 21402-250 (CURA-550) 
CURRENT APPLICATION NUMBER: US/ 1 0/ 052, 64 8 A 
CURRENT FILING DATE: 2002-12-09 
PRIOR APPLICATION NUMBER: 60/262,454 
PRIOR FILING DATE: 2001-01-18 
PRIOR APPLICATION NUMBER: 60/272,920 
PRIOR FILING DATE: 2001-03-02 
PRIOR APPLICATION NUMBER: 60/284,549 
PRIOR FILING DATE: 2001-04-18 
PRIOR APPLICATION NUMBER: 60/303,229 
PRIOR FILING DATE: 2001-07-05 
PRIOR APPLICATION NUMBER: 60/262,892 
PRIOR FILING DATE: 2001-01-19 
PRIOR APPLICATION NUMBER: 60/263,605 
PRIOR FILING DATE: 2001-01-23 
PRIOR APPLICATION NUMBER: 60/269,098 
PRIOR FILING DATE: 2001-02-15 
PRIOR APPLICATION NUMBER: 60/264,159 
PRIOR FILING DATE: 2001-01-25 
PRIOR APPLICATION NUMBER: 60/265,517 
PRIOR FILING DATE: 2001-01-31 
PRIOR APPLICATION NUMBER: 60/271,855 
PRIOR FILING DATE: 2001-02-27 

Remaining Prior Application data removed - See File Wrapper or PALM. 
NUMBER OF SEQ ID NOS : 97 
SOFTWARE: PatentlnVer. 2.1 
SEQ ID NO 4 
LENGTH: 92 8 
TYPE: PRT 

ORGANISM: Homo sapiens 
US-10-052-648A-4 

Query Match 50.0%; Score 1799; DB 15; Length 928; 

Best Local Similarity 49.3%; Pred. No. 5.3e-112; 

Matches 296; Conservative 61; Mismatches 212; Indels 32; Gaps 4 

Qy 14 LLLCHWIGTASPLNLEDPNVCSHWES YSVTVQESYPHPFDQI YYTSCTDILNW FKCT 7 0 

III : I I I I I I I I I I I : : I : I I : I I : i I I 

Db 9 LLLAVGLRLAGTLNPSDPNTCSFWESFTTTTKESHSRPFSLLPSEPCE--RPWEGPHTCP 66 

Qy 71 RHRVSYRTAYRHGEKTMYRRKSQCCPGFYESGEMCVPHCADKCVHGRCIAPNTCQCEPGW 130 

: I I I I I I I I : I :: I I I I I I I I I I I I I : I I I I I I : I I I I I I I I I 

Db 67 QPTVA/YRTVYRQVVKTDHRQRLQCCHGFYESRGFCVPLCAQECVHGRCVAPNQCQCVPGW 12 6 

Qy 131 GGTNCSSACDGDHWGPHCTSRCQCKNGALCNPITGACHCAAGFRGWRCEDRCEQGTYGND 190 

I : I I I I I I I I | | | : I : | : | | | : I : I I 1 I I 

Db 127 RGDDCSSECAPGMWGPQCDKPCSCGNNSSCDPKSGVCSCPSGLQPPNCLQPCTPGYYGPA 186 

Qy 191 CHQRCQCQNGATCDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPCQNGGVCHHVT 250 

I 1111:1111 I I I I I II I : I I I I I I 1 I I I 

Db 187 CQFRCQC-HGAPCDPQTGACFCPAERTGPSCDVSCSQGTSGFFCPSTHSCQNGGVFQTPQ 245 

Qy 251 GECSCPSGWM GTVCGQPCPEGRFGKNCSQECQCH 284 

I I I I I I I I I I : I I I I I I I I I I I I I : M 

Db 24 6 GSCSCPPGWMVWRVGPVGMGCGSGENSVGGAKQGSKGTICSLPCPEGFHGPNCSQECRCH 305 



Qy 


285 


Db 


306 


Qy 


345 


Db 


366 


Qy 


405 


Db 


426 


Qy 


465 


Db 


486 


Qy 


525 


Db 


546 


Qy 


585 


Db 


606 



NGGTCDAATGQCHCSPGYTGERCQDECPVGTYGVLCAETCQCVNGGKCYHVSGACLCEAG 34 4 

I I I I I I I I I I : I I I I I : M : : I I I I I : I I I I I I I * I • : I I I I I I I 

NGGLCDRFTGQCRCAPGYTGDRCREECPVGRFGQDCAETCDCAPDARCFPANGACLCEHG 365 

FAGERCEARLCPEGLYGIKCDKRCPCHLENTHSCHPMSGECACKPGWSGLYCNETCSPGF 4 04 

I I : I I | | | | : | | I : I I I : : I I i I I : I I I : I I I I : i I : I I I : I 

FTGDRCTDRLCPDGFYGLSCQAPRTCDREHSLSCHPMNGECSCLPGWAGLHCNESCPQDT 42 5 

YGEACQQICSCQNGADCDSVTGKCTCAPGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVC 4 64 

: I I I : I I : I I : : I I I I I I : I I : : I I I I I : I I i : I I I : I I 

HGPGCQEHCLCLHGGVCQATSGLCQCAPGYTGPHCASLCPPDTYGVNCSARCSCENAIAC 4 85 

SPVDGSCTCKAGWHGVDCSIRCPSGTWGFGCNLTCQCLNGGACNTLDGTCTCAPGWRGEK 52 4 

||:||||||| : I I : 1 I I I II I I I : I I I : I : I I I I I I I I 

SPIDGECVCKEGWQRGNCSVPCPPGTWGFSCNASCQCAHEAVCSPQTGACTCTPGWHGAH 545 

CELPCQDGTYGLNCAERCDCSHADGCHPTTGHCRCLPGWSGVHCDSVCAEGRWGPNCSLP 58 4 

hill I : I I I I I I I I : I I I I I N 111 I I I I I I I I I 

CQLPCPKGQFGEGCASRCDCDHSDGCDPVHGRCQCQAGWMGARCHLSCPEGLWGVNCSNT 605 



RESULT 12 
US-10-052-648A-6 

Sequence 6, Application US/10052648A 
Publication No. US20040005558A1 
GENERAL INFORMATION: 
APPLICANT: Anderson, David 
APPLICANT: Burgess, Catherine 
APPLICANT: Casman, Stacie 
APPLICANT: Colman, Steven 
APPLICANT: Edinger, Shlomit R. 
APPLICANT: Ellerman, Karen 
APPLICANT: Gerlach, Valerie 
APPLICANT: Gunther, Erik 
APPLICANT: Kekuda, Ramesh 
APPLICANT: MacDougall, John R. 
APPLICANT: Mehraban, Fuad 
APPLICANT: Patturajan, Meera 
APPLICANT: Rothenberg, Mark 
APPLICANT: Shimkets, Richard 
APPLICANT: Smithson, Glennda 
APPLICANT: Spytek, Kimberly A. 
APPLICANT: Stone, David J. 
APPLICANT: Vernet, Corine A.M. 
APPLICANT: Zerhusen, Bryan D. 

TITLE OF INVENTION: PROTEINS, POLYNUCLEOTIDES ENCODING THEM AND METHODS OF 
TITLE OF INVENTION: USING THE SAME 
FILE REFERENCE: 21402-250 (CURA-550) 
CURRENT APPLICATION NUMBER: US / 1 0/ 052 , 64 8A 
CURRENT FILING DATE: 2002-12-09 
PRIOR APPLICATION NUMBER: 60/262,454 
PRIOR FILING DATE: 2001-01-18 
PRIOR APPLICATION NUMBER: 60/272,920 



PRIOR FILING DATE: 2001-03-02 
PRIOR APPLICATION NUMBER: 60/284,549 
PRIOR FILING DATE: 2001-04-18 
PRIOR APPLICATION NUMBER: 60/303,229 
PRIOR FILING DATE: 2001-07-05 
PRIOR APPLICATION NUMBER: 60/262,892 
PRIOR FILING DATE: 2001-01-19 
PRIOR APPLICATION NUMBER: 60/263,605 
PRIOR FILING DATE: 2001-01-23 
PRIOR APPLICATION NUMBER: 60/269,098 
PRIOR FILING DATE: 2001-02-15 
PRIOR APPLICATION NUMBER: 60/264,159 
PRIOR FILING DATE: 2001-01-25 
PRIOR APPLICATION NUMBER: 60/265,517 
PRIOR FILING DATE: 2001-01-31 
PRIOR APPLICATION NUMBER: 60/271,855 
PRIOR FILING DATE: 2001-02-27 

Remaining Prior Application data removed - See File Wrapper or PALM. 
NUMBER OF SEQ ID NOS : 97 
SOFTWARE: PatentlnVer. 2.1 
SEQ ID NO 6 
LENGTH: 92 8 
TYPE: PRT 

ORGANISM: Homo sapiens 
US-10-052-648A-6 

Query Match 50.0%; Score 1799; DB 15; Length 928; 

Best Local Similarity 49.3%; Pred. No. 5.3e-112; 

Matches 296; Conservative 61; Mismatches 212; Indels 32; Gaps 4 

Qy 14 LLLCHWIGTASPLNLEDPNVCSHWESYSVTVQESYPHPFDQI YYTSCTDILNW FKCT 7 0 

I II : I I I I I I 1 I I I I : : I : I I : I I : I I I 

Db 9 LLLAVGLRLAGTLNPSDPNTCSFWESFTTTTKESHSRPFSLLPSEPCE — RPWEGPHTCP 66 

Qy 71 RHRVSYRTAYRHGEKTMYRRKSQCCPGFYESGEMCVPHCADKCVHGRCIAPNTCQCEPGW 13 0 

: I I M I I I I : I :: I I I I I I I I I I I II : I I I I I I : I M I I I I I I 

Db 67 QPTWYRTVYRQWKTDHRQRLQCCHGFYESRGFCVPLCAQECVHGRCVAPNQCQCVPGW 12 6 

Qy 131 GGTNCSSACDGDHWGPHCTSRCQCKNGALCNPITGACHCAAGFRGWRCEDRCEQGTYGND 190 

I : I I I I I I I I | | I : I : I : I I I : I : I I I I I 

Db 12 7 RGDDCSSECAPGMWGPQCDKPCSCGNNSSCDPKSGVCSCPSGLQPPNCLQPCTPGYYGPA 18 6 

Qy 191 CHQRCQCQNGATCDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPCQNGGVCHHVT 2 50 

I I I I I : I I i I I I I II I I I : I I I I I I I I I I 

Db 187 CQFRCQC-HGAPCDPQTGAC FCPAERTGPSCDVSCSQGTSGFFCPSTHSCQNGGVFQTPQ 245 

Qy 251 GECSCPSGWM GTVCGQPCPEGRFGKNCSQECQCH 284 

I I I I I I I I I I : I I I I I I I I I I I I I : I I 

Db 246 GSCSCPPGWMVWRVGPVGMGCGSGENSVGGAKQGSKGTICSLPCPEGFHGPNCSQECRCH 305 

Qy 285 NGGTCDAATGQCHCSPGYTGERCQDECPVGTYGVLCAETCQCVNGGKCYHVSGACLCEAG 344 

I I I I I I I I I I : I I I I I : I I : : II I i I : I 1 I I I I I : I : : I I I I I I I 

Db 306 NGGLCDRFTGQCRCAPGYTGDRCREECPVGRFGQDCAETCDCAPDARCFPANGACLCEHG 3 65 

Qy 345 FAGERCEARLCPEGLYGIKCDKRCPCHLENTHSCHPMSGECACKPGWSGLYCNETCSPGF 4 04 

I I : I I I I I I : I I I : I I I : : I I I I I : I I I : I I I I : I I : I I I : I 

Db 366 FTGDRCTDRLCPDGFYGLSCQAPRTCDREHSLSCHPMNGECSCLPGWAGLHCNESCPQDT 425 



Qy 4 05 YGEACQQICSCQNGADCDSVTGKCTCAPGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVC 4 64 

: I I I : I I : I I : : I I I I I I : I I : : I I I I I : I I I : I I I : I I 
Db 426 HGPGCQERCLCLHGGVCQATSGLCQCAPGYTGPHCASLCPPDTYGVNCSARCSCENAIAC 485 

Qy 4 65 SPVDGSCTCKAGWHGVDCSIRCPSGTWGFGCNLTCQCLNGGACNTLDGTCTCAPGWRGEK 52 4 

I I : I I I I I I I : i I : I I I I I 1 I I I : M I : I : I I I I I I I I 
Db 486 SPIDGECVCKEGWQRGNCSVPCPPGTWGFSCNASCQCAHEAVCSPQTGACTCTPGWHGAH 545 

Qy 525 CELPCQDGTYGLNCAERCDCSHADGCHPTTGHCRCLPGWSGVHCDSVCAEGRWGPNCSLP 584 

hill I : I I I I I I I h I I I I I hi III I I I I I I I I I 

Db 54 6 CQLPCPKGQFGEGCASRCDCDHSDGCDPVHGRCQCQAGWMGARCHLSCPEGLWGVNCSNT 605 

Qy 585 C 585 

I 

Db 606 C 606 



RESULT 13 
US-09-796-753-114 

; Sequence 114, Application US/09796753 

/ Publication No. US2 0030027 998A1 

; GENERAL INFORMATION: 

; APPLICANT: McCarthy, Sean A. 

TITLE OF INVENTION: SECRETED PROTEINS AND USES THEREOF 
; FILE REFERENCE: 7853-227-999 
; CURRENT APPLICATION NUMBER: US/ 09/7 96 , 753 

CURRENT FILING DATE: 2001-03-01 

PRIOR APPLICATION NUMBER: 09/183,175 

PRIOR FILING DATE: 1998-10-30 
; PRIOR APPLICATION NUMBER: 09/223,094 

PRIOR FILING DATE: 1998-12-30 
; PRIOR APPLICATION NUMBER: 09/223,546 
; PRIOR FILING DATE: 1998-12-30 
; PRIOR APPLICATION NUMBER: 09/224,246 
; PRIOR FILING DATE: 1998-12-30 

PRIOR APPLICATION NUMBER: 09/259,388 
; PRIOR FILING DATE: 1999-02-26 
; PRIOR APPLICATION NUMBER: 60/122,458 
; PRIOR FILING DATE: 1999-03-01 
; PRIOR APPLICATION NUMBER: 09/312,359 
/ PRIOR FILING DATE : 1999-05-14 
; PRIOR APPLICATION NUMBER: 09/336,536 
; PRIOR FILING DATE: 1999-06-18 
; PRIOR APPLICATION NUMBER: 09/342,687 
; PRIOR FILING DATE: 1999-06-29 
; PRIOR APPLICATION NUMBER: 09/345,464 

PRIOR FILING DATE: 1999-06-30 
; PRIOR APPLICATION NUMBER: 09/365,164 
; PRIOR FILING DATE: 1999-07-30 

PRIOR APPLICATION NUMBER: 09/399,723 
; PRIOR FILING DATE: 1999-09-20 
; PRIOR APPLICATION NUMBER: 09/409,634 
; PRIOR FILING DATE: 1999-09-30 
; PRIOR APPLICATION NUMBER: 09/471,179 
; PRIOR FILING DATE: 1999-12-23 
; PRIOR APPLICATION NUMBER: 09/474,071 



PRIOR 


FILING DATE 


1999- 


12 


-29 




PRIOR 


APPLICATION 


NUMBER: 




09/474, 


072 


PRIOR 


FILING DATE 


1999- 


12 


-29 




PRIOR 


APPLICATION 


NUMBER: 




09/514, 


010 


PRIOR 


FILING DATE 


2000- 


02 


-25 




PRIOR 


APPLICATION 


NUMBER: 




09/516, 


745 


PRIOR 


FILING DATE 


2000- 


03 


-01 




PRIOR 


APPLICATION 


NUMBER: 




09/572, 


002 


PRIOR 


FILING DATE 


2000- 


05 


-14 




PRIOR 


APPLICATION 


NUMBER: 




09/597, 


993 


PRIOR 


FILING DATE 


2000- 


06 


-19 




PRIOR 


APPLICATION 


NUMBER: 




09/599, 


596 


PRIOR 


FILING DATE 


2000- 


06 


-22 




PRIOR 


APPLICATION 


NUMBER: 




09/630, 


334 


PRIOR 


FILING DATE 


2000- 


07 


-31 




PRIOR 


APPLICATION 


NUMBER : 




09/606, 


565 


PRIOR 


FILING DATE 


2000- 


06 


-29 




PRIOR 


APPLICATION 


NUMBER: 




09/606,317 


PRIOR 


FILING DATE 


2000- 


06 


-29 




PRIOR 


APPLICATION 


NUMBER: 




09/665, 


666 


PRIOR 


FILING DATE: 


2000- 


09 


-20 




PRIOR 


APPLICATION 


NUMBER: 




09/677, 


751 


PRIOR 


FILING DATE: 


2000- 


09 


-30 




NUMBER OF SEQ ID NOS: 162 







SEQ ID NO 114 
LENGTH: 10 50 
TYPE: PRT 

ORGANISM: Homo sapiens 
US-09-796-753-114 



Query Match 46.3%; Score 1667.5; DB 10; Length 1050; 

Best Local Similarity 45.0%; Pred. No. 3.5e-103; 

Matches 284; Conservative 61; Mismatches 181; Indels 105; Gaps 



Qy 

Db 



14 LLLCHWIGTASPLNLEDPNVCSHWESYSVTVQESYPHPFDQIYYTSCTDILNW 66 

III : I II I II I I I I I : : I : I I : II : I I 

9 LLLAVGLRLAGTLNPSDPNTCSFWESFTTTTKESHSRPFSLLPSEPCE--RPWEGPHTCP 66 



QY 
Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



67 FKCTRHRVSYR TAY 8 0 

I I : I : 

67 SPQTQRKLLAS RDSFCMVCVGAGVQWRDRSALQPQTGNALSMRPQPRVLSGAPSLASPGH 12 6 

81 RHGEKTMYRRKSQCCPGFYESGEMCVPHCADKCVHGRCIAPNTCQCEPGWGGTNCSSA — 138 

II : I : : III I I I I I III II : I I I I I I : I I I III III I : I I I I 

12 7 TVWKTDHRQRLQCCHGFYESRGFCVPLCAQECVHGRCVAPNQCQCVPGWRGDDCSSAPN 18 6 

13 9 CDGDHWGPHCTSRCQCKNGALCNPITGACHCAAGFRGWRCEDRCEQGTYGNDCHQR 194 

I : : I I I I I I I : I I I : I I I I I I I I I : i I I I 
187 CLQPCTPGYYGPACQFRCQC-HGAPCDPQTGACFCPAERTGPSCDVSCSQGT 237 

195 CQCQNGATCDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPCQNGGVCHHVTGECS 254 

: I I I I I I I I I I I I III 

238 SGFFC PSTH PCQNGGVFQTPQGSCS 262 

2 55 CPSGWMGTVCGQPCPEGRFGKNCSQECQCHNGGTCDAATGQCHCSPGYTGERCQDECPVG 314 

II I I I II : I I I I I I I I I I I I I : I I I I I II I I I I I : I I I I I : I I I I I I I 
2 63 CPPGWMGTICSLPCPEGFHGPNCSQECRCHNGGLCDRFTGQCRCAPGYTGDRCREECPVG 322 



Qy 315 TYGVLCAETCQCVNGGKCYHVSGACLCEAGFAGERCEARLCPEGLYGIKCDKRCPCHLEN 374 

: I I I I 1 I I : I : : I I I I I I I I I : I I I I I I : I I I : I II I : 

Db 323 RFGQDCAETCDCAPDARCFPANGACLCEHGFTGDRCTDRLCPDGFYGLSCQAPCTCDREH 382 

Qy 375 THSCHPMSGECACKPGWSGLYCNETCSPGFYGEACQQICSCQNGADCDSVTGKCTCAPGF 434 

: I I I I I : I ! i : I I I I : I I : I I 1 : I : I I i : I I : I I : : I I I I I h 

Db 383 SLSCHPMNGECSCLPGWAGLHCNESCPQDTHGPGCQEHCLCLHGGVCQATSGLCQCAPGY 442 

Qy 435 KGIDCSTPCPLGTYGINCSSRCGCKNDAVCSPVDGSCTCKAGWHGVDCSIRCPSGTWGFG 494 

I | : : II I I I : I I I : I I I : I I I I : I I I I I I I : I I : I I I I I I I 
Db 443 TGPHCASLCPPDTYGVNCSARCSCENAIACSPIDGECVCKEGWQRGNCSVPCPPGTWGFS 502 

Qy 4 95 CNLTCQCLNGGACNTLDGTCTCAPGWRGEKCELPCQDGTYGLNCAERCDCSHADGCHPTT 554 

I I : I I I : I : I III III I I : I I I I : I II I I I I I : I I I I 

Db 503 CNASCQCAHEAVCSPQTGACTCTPGWHGAHCQLPCPKGQFGEGCASRCDCDHSDGCDPVH 562 

Qy 555 GHCRCLPGWSGVHCDSVCAEGRWGPNCSLPC 585 

11:11111 I I I I I I I I I 
Db 563 GRCQCQAGWMGARCHLSCPEGLWGWCSNTC 593 



RESULT 14 
US-10-312-352-21 

Sequence 21, Application US/10312352 
Publication No. US20 04 0053824A1 
GENERAL INFORMATION: 
APPLICANT: INCYTE GENOMICS , INC.; TANG, Y. Tom 
APPLICANT: YUE, Henry; AZIMZAI, Yalda 
APPLICANT: HE, Ann; BATRA, Sajeev 
APPLICANT: LO, Terence P.; NGUYEN, Danniel B. 
APPLICANT: BURRILL, John D.; MARCUS, Gregory A. 
APPLICANT: ZINGLER, Kurt A.; GANDHI , Ameena R. 
APPLICANT: LAL, Preeti G. ; KEARNEY, Liarn 
APPLICANT: BURFORD, Neil; YAO, Monique G. 
APPLICANT: CHAW LA, Narinder K. ; ELLIOT, Vicki S. 
APPLICANT: ARVIZU, Chandra S.; KHAN, Farrah A. 
APPLICANT: BAUGHN, Mariah R. ; HAFALIA, April, J. A. 
APPLICANT: POLICKY, Jennifer L- ; AU-YOUNG, Janice K. 
APPLICANT: LU, Yan; BOROWSKY, Mark L. 
APPLICANT: LU, Dyung Aina M. ; RAMKUMAR, Jayalaxmi 
APPLICANT: YANG, Junming; GURURAJAN, Rajagopal 
APPLICANT: WARREN, Bridget A.; GIETZEN, Kimberly J. 
APPLICANT: XU, Yuming; KALLICK, Deborah A. 
APPLICANT: LEE, Ernestine A. ; T HAN GAVE LU, Kavitha 
APPLICANT: DELEGEANE, Angelo M. ; LEE, Sally 

TITLE OF INVENTION: EXTRACELLULAR MATRIX AND CELL ADHESION MOLECULES 
FILE REFERENCE: PF-0794 USN 

CURRENT APPLICATION NUMBER: US/ 1 0/ 3 12 , 352 
CURRENT FILING DATE: 2002-12-18 
PRIOR APPLICATION NUMBER: PCT/US01/21067 
PRIOR FILING DATE: 2001-06-29 
PRIOR APPLICATION NUMBER: US 60/215,454 
PRIOR FILING DATE: 2000-06-30 
PRIOR APPLICATION NUMBER: US 60/219,462 
PRIOR FILING DATE: 2000-07-18 
PRIOR APPLICATION NUMBER: US 60/240,111 



PRIOR FILING DATE: 2000-10-12 
PRIOR APPLICATION NUMBER: US 60/240,106 
PRIOR FILING DATE: 2000-10-12 
PRIOR APPLICATION NUMBER: US 60/244,021 
PRIOR FILING DATE: 2000-10-27 
PRIOR APPLICATION NUMBER: US 60/248,887 
PRIOR FILING DATE: 2000-11-14 
PRIOR APPLICATION NUMBER: US 60/249,570 
PRIOR FILING DATE: 2000-11-16 
NUMBER OF SEQ ID NOS : 72 
SOFTWARE: PERL Program 
SEQ ID NO 21 
LENGTH: 1393 
TYPE: PRT 

ORGANISM: Homo sapiens 
FEATURE : 

NAME/ KEY: mis cofeature 

OTHER INFORMATION: Incyte ID No. US2 0 04 0 053 82 4A1 3351332CD1 
US-10-312-352-21 

Query Match 37.7%; Score 1356.5; DB 12; Length 1393; 

Best Local Similarity 42.6%; Pred. No. 2.5e-82; 

Matches 232; Conservative 50; Mismatches 207; Indels 56; Gaps 10; 

Qy 92 SQC CP-GFYESGEMCVPHCADKCVH-GRC-IAPNTCQCEPGWGGTNCSSACDGDHWG 14 5 

I : I I I I : I I i I : I : I I I I I III I :l I I I II I 
Db 706 SRCQDVCPAGWY — GPSCQTRCS — CANDGHCHPATGHCSCAPGWTGFSCQRACDTGHWG 761 

Qy 14 6 PHCTSRCQCKNG-ALCNPITGACHCAAGFRGWRCEDRCEQGTYGNDCHQRCQCQNGATCD 2 04 

II: || I I : I : I I I I I : I I I I : I ! I : I I I I I I I : I I I I 

Db 762 PDCSHPCNCSAGHGSCDAISGLCLCEAGYVGPRCEQQCPQGHFGPGCEQLCQCQHGAACD 821 

Qy 2 05 HVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPCQNGGVCHHVTGECSCPSGWMGTVC 2 64 

I I : I I I I I : I I I I III I I II I I I I I I I : I I I 
Db 822 HVSGACTCPAGWRGTFCEHACPAGFFGLDCRSACNCTAGAACDAVNGSCLCPAGRRGPRC 8 81 

Qy 265 GQPCPEGRFGKNCSQECQCHNGGTCDAATGQCHCSP 300 

: I I : I I I I I I I I I : I I I I I I I : I 
Db 8 82 AETCPAHTYGHNCSQACACFNGASCDPVHGQCHCAPGWMGPSCLQECLPRDVRAGCRHSG 941 

Qy 3 01 GYTGERCQDECPVGTYGVLCAETCQCVNGGKCYHVSGACLC 341 

I : I I : : I I I I : I I I : I I I I : I I : I I I I 
Db 942 GCLNGGLCDPHTGRCLCPAGWTGDKCQSPCLRGWFGEACAQRCSCPPGAACHHVTGACRC 1001 

Qy 342 EAGFAGERCEARLCPEGLYGIKCDKRCPCHLENTHSCHPMSGECACKPGWSGLYCNETCS 401 

III I I : I I I : I I : I I II : I I I : I I : I I : I I : I 

Db 1002 PPGFTGSGCE-QACPPGSFGEDCAQMCQCPGENP-ACHPATGTCSCAAGYHGPSCQQRCP 1059 

Qy 4 02 PGFYGEACQQICSCQNGADCDSVTGKCTCAPGFKGIDCSTPCPLGTYGINCSSRCGCKND 4 61 

I I I I I : I : I I I I I I : I I I I Mill: I I I M II : III 
Db 1060 PGRYGPGCEQLCGCLNGGSCDAATGACRCPTGFLGTDCNLTCPQGRFGPNCTHVCGCGQG 1119 

Qy 462 AVCSPVDGSCTCKAGWHGVDCSIRCPSGTWGFGCNLTCQCLNGGACNTLDGTCTCAPGWR 521 

I I M I M I I III II MM M I I II I : M M M II 

Db 1120 AACDPVTGTCLCPPGRAGVRCERGCPQNRFGVGCEHTCSCRNGGLCHASNGSCSCGLGWT 1179 



Qy 



522 GEKCELPCQDGTYGLNCAERCDCSHADGCHPTTGHCRCLPGWSGVHCDSVCAEGRWGPNC 581 



I I I I I III I II: I I i I I I I I I : I I : I 1 I I 
Db 118 0 GRHCELACPPGRYGAACHLECSCHNNSTCEPATGTCRCGPGFYGQACEHPCPPGFHGAGC 123 9 

Qy 582 SLPCY 586 

I : 

Db 1240 QGLCW 1244 



RESULT 15 

US-10-369-493-5280 

Sequence 5280, Application US/10369493 
Publication No. US20030233675A1 
GENERAL INFORMATION: 
APPLICANT: Cao, Yongwei 
APPLICANT: Hinkle, Gregory J. 
APPLICANT: Slater, Steven C. 
APPLICANT: Goldman, Barry S. 
APPLICANT: Chen, Xianfeng 

TITLE OF INVENTION: EXPRESSION OF MICROBIAL PROTEINS IN PLANTS FOR PRODUCTION 
OF 

TITLE OF INVENTION: PLANTS WITH IMPROVED PROPERTIES 
FILE REFERENCE: 3 8- 1 0 ( 52 052 ) B 
CURRENT APPLICATION NUMBER: US/ 10/369 , 493 
CURRENT FILING DATE : 2003-02-23 
PRIOR APPLICATION NUMBER: US 60/360,039 
PRIOR FILING DATE: 2002-02-21 
NUMBER OF SEQ ID NOS : 47374 
SEQ ID NO 5280 
LENGTH: 1111 
TYPE: PRT 

ORGANISM: Caenorhabditis elegans 
US-10-369-493-5280 

Query Match 35.7%; Score 1284.5; DB 15; Length 1111; 

Best Local Similarity 34.2%; Pred. No. 1.3e-77; 

Matches 246; Conservative 77; Mismatches 221; Indels 175; Gaps 20; 

Qy 21 GTASPLNLEDPNVCSHWESYSVTVQESYPHPFDQIYYT SCTDILNWFKCTRHR 73 

III : : I I : i : : I : : : I I : I I : I 

Db 35 GTTEP QGDHVCT VKTIVDDY — ELKKVIHTWYNDTEQCLNPLTGFQC 8 0 

Qy 74 VSYRTAYRHGEKTMYRRK SQCCPGFYESGE-MCVPHCADKCVHGRCIAPNTC 124 

I : I : I I : I : I I I I : I : : : I : I I I I : II I I 

Db 81 TVEKRGQKASYQRQLVKKEKYVKQCCDGYYQTKDHFCLPDCNPPCKKGKCIEPGKC 136 

Qy 125 QCEPGWGGTNCSSACDGDHWGPHCTSRCQCKNGALCNPITGACHCAAGFRGWRCE 179 

: I : I I : I I I : I : I II I : I I : I I I I : I I I I : I I : I I I I 

Db 137 ECDPGYGGKYCASSCSVGTWGLGCSKSCDCENGANCDPELGTCICTSGFQGERCEKPCPD 196 

Qy 180 DRCEQGTYGNDCHQRCQCQNGAT 2 02 

: : I I : I : I : 1 I I I I I I I 

Db 197 NKWGPNCVKSCPCQNGGKCNKEGKCVCSDGWGGEFCLNKCEEGKFGAECKFECNCQNGAT 256 

Qy 203 CDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPCQNGGVCHHVTGECSCPSGWMGT 262 

II: I : I I I I I I I I : i I I I I : I I I I : I I I I III 
Db 257 CDNTNGKCICKSGYHGALCENECSVGFFGSGCTQKCDCLNNQNCDSSSGECKC-IGWTGK 315 



Qy 263 VCGQPCPEGRFGKNCSQECQC HNGGTCDAATGQCHCSPGYTGERCQD-ECPVGT 315 

I I I I I I I I I I : : I I I I I I I I I I I : I : : I 

Db 316 HCDIGCSRGRFGLQCKQNCTCPGLEFSDSNASCDAKTGQCQCESGYKGPKCDERKCDAEQ 375 

Qy 316 YGVLCAETCQCV— NGGKCYHVSGACLCEAGFAGERCEARLCPEGLYGIKCDKRCPCHLE 373 

II I : : I I I I I I : I I I : I I I : I I I : II I : I : I 

Db 37 6 YGADCSKTCTCVRENTLMCAPNTGFCRCKPGFYGDNCEL-ACSKDSYGPNCEKQAMCDWN 4 34 

Qy 374 NTHSCHPMSGECACKPGWSGLYCNETCSPGFYGEACQQICSC-QNGADCDSVTGKCTCAP 432 

: I : I : I I I I I I : I N I III I I I I I II I I I I 
Db 435 HASECNPETGSCVCKPGRTGKNCSEPCPLDFYGPNCAHQCQCNQRGVGCDGADGKCQCDR 4 94 

Qy 433 GFKGIDCSTPCPLGTYGINCSSRCGCKNDAVCSPVDGSCTCKAGWHGVDCSIRCPSGTWG 492 

I : I I II hill III I I : I I II I I | : | | I | | : : | 

Db 495 GWTGHRCEHHCPADTFGANCEKRCKCPKGIGCDPITGECTCPAGLQGANCDIGCPEGSYG 554 

Qy 493 FGCNLTCQCLNGGACNTLDGTCTCAPGW RGEKCEL--PCQD 531 

I I I I : I : I I I : I I I I I I : I I I I I Ml 

Db 555 PGCKLHCKCVN-GKCDKETGECTCQPGFFGSDCSTTCSKGKYGESCELSCPCSDASCSKQ 613 

Qy 532 GTYGLNCAERCD 543 

II I : : I : : I I 

Db 614 TGKCLCPLGTKGVSCDQKCDPNTFGFLCQETVTPSPCASTDPKNGVCLSCPPGSSGIHCE 673 

Qy 544 CSHAD--GCHPTTGHCRCLPGWSGVHCDSVCAEGRWGPNCSLPC 585 

I I I I I I I 1 I I I I I I : I I I : I : : I I : I I 

Db 67 4 HNCPAGSYGDGCQQVCSCADGHGCDPTTGECICEPGYHGKTCSEKCPDGKYGYGCALDC 7 32 



Search completed: March 26, 2004, 16:21:16 
Job time : 28.1611 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM protein - protein search, using sw model 
Run on : 



Title: 

Perfect score: 
Sequence : 

Scoring table: 



March 26, 2004, 16:04:46 ; Search time 28.5191 Seconds 

(without alignments) 
6483.148 Million cell updates/sec 

US-10-092-390-4 
3601 

1 MVISLNSCLSFICLLLCHWI. HCDSVCAEGRWGPNCSLPCY 586 

BLOSUM62 

Gapop 10.0 , Gapext 0.5 



Searched: 1017041 seqs, 315518202 residues 

Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 



1017041 



Database : 



S PTREMBL_2 5 : * 

1: sp_archea:* 

2: sp_bacteria : * 

3 : sp_f ungi : * 

4 : sp_human : * 

5: sp_invertebrate : * 

6 : sp_mammal : * 

7 : sp_mhc : * 

8: sp_organelle : * 

9: sp__phage:* 

10: sp_plant:* 

11 : sp_rodent : * 

12: sp_virus:* 

13: sp_vertebrate: + 

14: sp_unclassif ied: * 

15: sp_rvirus:* 

16: sp_bacteriap : * 

17: sp archeap:* 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUMMARIES 



Result Query 

No. Score Match Length DB ID 



Description 



1 


3601 


100. 


.0 


1140 


4 


Q96KG7 


Q96kg7 homo sapien 


2 


3468 


96. 


. 3 


567 


4 


Q8WUL3 


Q8wul3 homo sapien 


3 


2343.5 


65. 


.1 


947 


11 


Q8BKK7 


Q8bkk7 mus musculu 


4 


2094 


58. 


,2 


969 


4 


Q96KG6 


Q96kg6 homo sapien 


5 


1828 


50. 


. 8 


747 


11 


Q8VHF4 


Q8vhf4 mus musculu 


6 


1828 


50. 


, 8 


1034 


11 


Q8VHL7 


Q8vhl7 mus musculu 


7 


1824 


50. 


,7 


1034 


11 


Q8VIK5 


Q8vik5 mus musculu 


8 


1690 


46. 


, 9 


1004 


11 


Q8CGA7 


Q8cga7 mus musculu 


9 


1575 


43. 


,7 


921 


11 


Q80T91 


Q80t91 mus musculu 


10 


1372.5 


38 . 


1 


1574 


11 


088281 


088281 rattus norv 


11 


1362 


37. 


8 


626 


4 


Q8ND91 


Q8nd91 homo sapien 


12 


1341 


37. 


2 


299 


11 


Q8BX64 


Q8bx64 mus musculu 


13 


1340 


37. 


2 


1664 


5 


Q9TVQ2 


Q9tvq2 caenorhabdi 


14 


1309 


36. 


4 


220 


11 


Q63404 


Q63404 rattus norv 


15 


1290.5 


35. 


8 


881 


5 


Q9W0A0 


Q9w0a0 drosophila 


16 


1284.5 


35. 


7 


1045 


5 


Q8T3A6 


Q8t3a6 caenorhabdi 


17 


1284.5 


35. 


7 


1070 


5 


Q8T3A7 


Q8t3a7 caenorhabdi 


18 


1284.5 


35. 


7 


1111 


5 


Q9XWD6 


Q9xwd6 caenorhabdi 


19 


1282.5 


35. 


6 


1246 


4 


075095 


O75095 homo sapien 


20 


1230. 5 


34. 


2 


546 


11 


Q8 0V7 0 


Q80v70 mus musculu 


21 


843.5 


23. 


4 


569 


4 


Q8NHD4 


Q8nhd4 homo sapien 


22 


839 


23. 


3 


320 


4 


Q8N780 


Q8n780 homo sapien 


23 


800 


22 . 


2 


866 


4 


Q8IXF3 


Q8ixf3 homo sapien 


24 


741 


20. 


6 


594 


5 


Q9W0A1 


Q9w0al drosophila 


25 


740 


20. 


5 


594 


5 


Q9Y151 


Q9yl51 drosophila 


26 


708.5 


19. 


7 


2447 


13 


013149 


013149 fugu rubrip 


27 


702 . 5 


19. 


5 


337 


4 


Q8NHD3 


Q8nhd3 homo sapien 


28 


702.5 


19. 


5 


342 


4 


Q8NHD5 


Q8nhd5 homo sapien 


29 


693 


19. 


2 


2516 


11 


Q7TQ52 


Q7tq52 mus musculu 


30 


693 


19. 


2 


2526 


11 


Q7TQ51 


Q7tq51 mus musculu 


31 


693 


19. 


2 


2531 


11 


Q8K428 


Q8k428 mus musculu 


32 


693 


19. 


2 


2531 


11 


Q7TQ50 


Q7tq50 mus musculu 


33 


687 


19. 


1 


744 


4 


Q8NHD2 


Q8nhd2 homo sapien 


34 


685.5 


19. 


0 


2524 


5 


Q9GPA5 


Q9gpa5 branchiosto 


35 


682 . 5 


19. 


0 


2468 


13 


Q80OE4 


Q800e4 brachydanio 


36 


682 


18. 


9 


4288 


4 


Q9NPK9 


Q9npk9 homo sapien 


37 


681. 5 


18. 


9 


2531 


5 


016004 


016004 lytechinus 


38 


681 


18. 


9 


2653 


5 


Q25253 


Q25253 lucilia cup 


39 


678 


18. 


8 


1193 


13 


Q90819 


Q90819 gallus gall 


40 


676. 5 


18 . 


8 


4006 


11 


035452 


035452 mus musculu 


41 


672.5 


18 . 


7 


4135 


6 


018977 


018977 bos taurus 


42 


667.5 


18 . 


5 


4114 


11 


054796 


054796 mus musculu 


43 


666 


18 . 


5 


2528 


13 


Q8AXP0 


Q8axp0 cynops pyrr 


44 


664 


18 . 


4 


1214 


13 


Q90YD2 


Q90yd2 xenopus lae 


45 


657.5 


18 . 


3 


2428 


5 


Q8I6X6 


Q8i6x6 boophilus m 



ALIGNMENTS 



RESULT 1 
Q96KG7 

ID Q96KG7 PRELIMINARY 
AC Q96KG7; 

DT 01-DEC-2001 (TrEMBLrel. 
DT 01-DEC-2001 (TrEMBLrel. 
DT 01-OCT-2003 (TrEMBLrel. 



PRT; 1140 AA. 
19, Created) 

19, Last sequence update) 
25, Last annotation update) 



DE MEGF10 protein (Hypothetical protein KIAA1780) . 

GN MEGF1 0 . 

OS Homo sapiens (Human) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 

OC Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

OX NCBI__TaxID=9 60 6; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE=Hippocampus ; 

RX MEDLINE=21245130; PubMed=11347906; 

RA Nagase T . , Nakayama M. , Nakajima D., Kikuno R. , Ohara 0.; 

RT "Prediction of the coding sequences of unidentified human genes. XX. 

RT The complete sequences of 100 new cDNA clones from brain which code 

RT for large Proteins in vitro."; 

RL DNA Res. 8:85-95(2 001). 

DR EMBL; AB058676; BAB47409.1; -. 

DR GO; GO: 0005198; F: structural molecule activity; IEA. 

DR InterPro; IPR006209; EGF_like. 

DR InterPro; IPR009030; Grow_f ac_recep . 

DR InterPro; IPR002049; Laminin_EGF. 

DR Pfam; PF00008; EGF; 10. 

DR PRINTS; PR00011; EGFLAMININ. 

DR SMART; SM0018 0; EGF_Lam; 6. 

DR PROSITE; PS00022; EGF_1 ; 17. 

DR PROSITE; PS01186; EGF_2 ; 17. 

KW Hypothetical protein; EGF-like domain; Laminin EGF-like domain. 

SQ SEQUENCE 1140 AA; 122204 MW; 45B2FA239423895A C RC 64; 



Query Match 100. 0% ; Score 3601; DB 4; Length 1140; 

Best Local Similarity 100.0%; Pred. No. 0; 

Matches 586; Conservative 0; Mismatches 0; Indels 0; Gaps 0 
Qy 1 MVISLNSCLSFICLLLCHWIGTASPLNLEDPNVCSHWESYSVTVQESYPHPFDQIYYTSC 60 

I I I I I I i M I I I I II I I I I I I I I I I I I I I I II I I I II I I I I I I I i I I I I I I I I I I I I I I I 

Db 1 MVISLNSCLSFICLLLCHWIGTASPLNLEDPNVCSHWESYSVTVQESYPHPFDQIYYTSC 60 

Qy 61 TDILNWFKCTRHRVSYRTAYRHGEKTMYRRKSQCCPGFYESGEMCVPHCADKCVHGRCIA 12 0 

I I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I II I I I 
Db 61 TDILNWFKCTRHRVSYRTAYRHGEKTMYRRKSQCCPGFYESGEMCVPHCADKCVHGRCIA 12 0 

Qy 121 PNTCQCEPGWGGTNCSSACDGDHWGPHCTSRCQCKNGALCNPITGACHCAAGFRGWRCED 18 0 

I I I II I I I I I I I I I I I I I I I I I I II I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 121 PNTCQCEPGWGGTNCSSACDGDHWGPHCTSRCQCKNGALCNPITGACHCAAGFRGWRCED 18 0 

Qy 181 RCEQGTYGNDCHQRCQCQNGATCDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPC 240 

I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I II I I I I I 
Db 181 RCEQGTYGNDCHQRCQCQNGATCDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPC 240 

Qy 241 QNGGVCHHVTGECSCPSGWMGTVCGQPCPEGRFGKNCSQECQCHNGGTCDAATGQCHCSP 300 

I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 
Db 241 QNGGVCHHVTGECSCPSGWMGTVCGQPCPEGRFGKNCSQECQCHNGGTCDAATGQCHCSP 300 

Qy 301 GYTGERCQDECPVGTYGVLCAETCQCVNGGKCYHVSGACLCEAGFAGERCEARLCPEGLY 3 60 

I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 
Db 301 GYTGERCQDECPVGTYGVLCAETCQCVNGGKCYHVSGACLCEAGFAGERCEARLCPEGLY 3 60 



Qy 



361 GIKCDKRCPCHLENTHSCHPMSGECACKPGWSGLYCNETCSPGFYGEACQQICSCQNGAD 42 0 



Db 361 GIKCDKRCPCHLENTHSCHPMSGECACKPGWSGLYCNETCSPGFYGEACQQICSCQNGAD 42 0 

Qy 421 CDSVTGKCTCAPGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVCSPVDGSCTCKAGWHGV 480 

M I I I I I I I I I I I I I I I I | | | | | I I I I I I I I I I I I I I I I I I I I I I II I I I I M I I I I I I I 

Db 421 CDSVTGKCTCAPGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVCSPVDGSCTCKAGWHGV 48 0 

Qy 481 DCSIRCPSGTWGFGCNLTCQCLNGGACNTLDGTCTCAPGWRGEKCELPCQDGTYGLNCAE 540 

M I I I I I I I I I II I I I I I I I I I I I I | | I I | | I I | | | I | I M I I I I II I I I I I I I I I | | I I 
Db 481 DCSIRCPSGTWGFGCNLTCQCLNGGACNTLDGTCTCAPGWRGEKCELPCQDGTYGLNCAE 540 

Qy 541 RCDCSHADGCHPTTGHCRCLPGWSGVHCDSVCAEGRWGPNCSLPCY 586 

I I I I I I I I II I I I I I I II I I I I I M I II I II I I II I I I I I I I II I I 

Db 541 RCDCSHADGCHPTTGHCRCLPGWSGVHCDSVCAEGRWGPNCSLPCY 58 6 



RESULT 2 
Q8WUL3 

ID Q8WUL3 PRELIMINARY; PRT; 567 AA. 

AC Q8WUL3; 

DT 01-MAR-2002 (TrEMBLrel . 20, Created) 

DT 01-MAR-2002 (TrEMBLrel. 20, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE Similar to MEGF10 protein. 

OS Homo sapiens (Human) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

OX NCBI_TaxID-9 60 6; 

RN [lj 

RP SEQUENCE FROM N.A. 

RC TISSUE=Muscle; 

RA Strausberg R. ; 

RL Submitted (DEC-2001) to the EMBL/ GenBank/DDB J databases. 

DR EMBL; BC020198; AAH20198.1; -. 

DR GO; GO: 0005198; F: structural molecule activity; IEA. 

DR InterPro; IPR006209; EGF_like. 

DR InterPro; IPR002049; Laminin_EGF. 

DR Pfam; PF00008; EGF; 7. 

DR PRINTS; PR00011; EGFLAMININ . 

DR SMART; SM0018 0; EGF_Lam; 4. 

DR PROSITE; PS00022; EGF_1 ; 10. 

DR PROSITE; PS01186; EGF_2 ; 10. 

KW EGF-like domain; Laminin EGF-like domain. 

SQ SEQUENCE 567 AA; 60797 MW; CF2 FB8CDEB7CF627 CRC64; 

Query Match 96.3%; Score 3468; DB 4; Length 567; 

Best Local Similarity 99.8%; Pred. No. 0; 

Matches 565; Conservative 1; Mismatches 0; Indels 0; Gaps 0; 

Qy 1 MVISLNSCLSFICLLLCHWIGTASPLNLEDPNVCSHWESYSVTVQESYPHPFDQIYYTSC 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Mill 

Db 1 MVISLNSCLSFICLLLCHWIGTASPLNLEDPNVCSHWESYSVTVQESYPHPFDQIYYTSC 60 

Qy 61 TDILNWFKCTRHRVSYRTAYRHGEKTMYRRKSQCCPGFYESGEMCVPHCADKCVHGRCIA 12 0 

I I I I I I I I I I i I I I I I I I I I | | | | | | 

Db 61 TDILNWFKCTRHRVSYRTAYRHGEKTMYRRKSQCCPGFYESGEMCVPHCADKCVHGRCIA 12 0 



Qy 121 PNTCQCEPGWGGTNCSSACDGDHWGPHCTSRCQCKNGALCNPITGACHCAAGFRGWRCED 180 

I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I i 
Db 121 PNTCQCEPGWGGTNCSSACDGDHWGPHCTSRCQCKNGALCNPITGACHCAAGFRGWRCED 18 0 

Qy 181 RCEQGTYGNDCHQRCQCQNGATCDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPC 240 

I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 181 RCEQGTYGNDCHQRCQCQNGATCDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPC 24 0 

Qy 241 QNGGVCHHVTGECSCPSGWMGTVCGQPCPEGRFGKNCSQECQCHNGGTCDAATGQCHCSP 300 

I I I! I I I I I I I I I I I II I I I I I I I I I I I I I I II I 1 I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 241 QNGGVCHHVTGECSCPSGWMGTVCGQPCPEGRFGKNCSQECQCHNGGTCDAATGQCHCSP 300 

Qy 301 GYTGERCQDECPVGTYGVLCAETCQCVNGGKCYHVSGACLCEAGFAGERCEARLCPEGLY 360 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 301 GYTGERCQDECPVGTYGVLCAETCQCVNGGKCYHVSGACLCEAGFAGERCEARLCPEGLY 360 

Qy 361 GIKCDKRCPCHLENTHSCHPMSGECACKPGWSGLYCNETCSPGFYGEACQQICSCQNGAD 420 

I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I M I I I 

Db 3 61 GIKCDKRCPCHLENTHSCHPMSGECACKPGWSGLYCNETCSPGFYGEACQQICSCQNGAD 42 0 

Qy 421 CDSVTGKCTCAPGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVCSPVDGSCTCKAGWHGV 4 80 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 

Db 421 CDSVTGKCTCAPGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVCSPVDGSCTCKAGWHGV 480 

Qy 481 DCSIRCPSGTWGFGCNLTCQCLNGGACNTLDGTCTCAPGWRGEKCELPCQDGTYGLNCAE 54 0 

II I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I I 

Db 481 DCSIRCPSGTWGFGCNLTCQCLNGGACNTLDGTCTCAPGWRGEKCELPCQDGTYGLNCAE 54 0 

Qy 541 RCDCSHADGCHPTTGHCRCLPGWSGV 566 

I I I I I I I I I I I I I I I I I i I I 11 I I I : 
Db 541 RCDCSHADGCHPTTGHCRCLPGWSGL 566 



RESULT 3 
Q8BKK7 

ID Q8BKK7 PRELIMINARY; PRT; 947 AA. 

AC Q8BKK7; 

DT 01-MAR-2003 (TrEMBLrel. 23, Created) 

DT 01-MAR-2003 (TrEMBLrel. 23, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE MEGF11 protein. 

GN 2410080H04RIK. 

OS Mus musculus (Mouse) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

OX NCBI_TaxID-10090; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=C57BL/6J; TISSUE^Dorsal root ganglion; 

RX MEDLINE=22354683; PubMed=124 66851 ; 

RA The FAN TOM Consortium, 

RA the RIKEN Genome Exploration Research Group Phase I & II Team; 

RT "Analysis of the mouse transcriptome based on functional annotation of 

RT 60, 770 full-length cDNAs . " ; 

RL Nature 4 2 0:563-573(2002). 

DR EMBL; AK051642; BAC34702.1; 

DR MGD; MGI:1920951; 2 4 1 00 8 OHO 4Ri k . 



DR 


GO; GO: 0016020; C:membrane; IEA. 




DR 


GO; GO: 0045285; C : ubiquinol-cytochrome-c reductase complex; 


IEA. 


DR 


GO; GO: 0005198; F: structural molecule activity; IEA. 




DR 


GO; GO: 0008121; F : ubiquinol-cytochrome-c reductase activity; 


IEA 


DR 


GO; GO: 0006118; P:electron transport; IEA. 




DR 


InterPro; IPR006209; EGF like. 




DR 


InterPro; IPR006210; IEGF. 




DR 


InterPro; IPR002049; Laminin EGF . 




DR 


InterPro; IPR005805; Rieske. 




DR 


Pfam; PF00008; EGF; 11. 




DR 


PRINTS; PR00011; EGFLAMININ. 




DR 


SMART; SM00181; EGF; 15. 




DR 


SMART; SM00180; EGF Lam; 15. 




DR 


PROSITE; PS00022; EGF 1; 15. 




DR 


PROSITE; PS01186; EGF 2; 15. 




DR 


PROSITE; PS00200; RIESKE 2; 1. 




SQ 


SEQUENCE 947 AA; 100661 MW; 0C209B11DFEE8314 CRC64; 





Query Match 65.1%; Score 2343.5; DB 11; Length 947; 

Best Local Similarity 63.7%; Pred. No. 1.2e-215; 

Matches 364; Conservative 55; Mismatches 121; Indels 31; Gaps 1 

Qy 15 LLCHWIGTASPLNLEDPNVCSHWESYSVTVQESYPHPFDQI YYTSCTDILNWFKCTRHRV 74 

I i : I II I I I I I I I I I I I I : I I I I I I I I I I II I I I I I I I I M I I I I I II : 
Db 8 LLVFLLQAALALNP ED PNVC S HWE S YAVT VQE S YAH P FDQ I Y YT RCAD I LNW FKCTRH RI 67 

Qy 7 5 SYRTAYRHGEKTMYRRKSQCCPGFYESGEMCVPHCADKCVHGRCIAPNTCQCEPGWGGTN 13 4 

I I : I I I I I : I I I I I : I I I I I I : I I : I : I : 
Db 68 SYKTAYRRGLRTMYRRRSQCCPGYYENGDFCI 9 9 

Qy 135 CSSACDGDHWGPHCTSRCQCKNGALCNPITGACHCAAGFRGWRCEDRCEQGTYGNDCHQR 194 

II : I I I I II : : I I I I : I I I I I I I I I I I I II I I I I I I I I : I I I : I I 
Db 100 RCDSEHWGPHCSNRCQCQNGALCNPITGACVCAPGFRGWRCEELCAPGTHGKGCQLL 156 

Qy 195 CQCQNGATCDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPCQNGGVCHHVTGECS 254 

I I I : I I : I I I I I I I I I I I I : I I : I I I I I II II I I I I I I I I I I I : I I I I : 

Db 157 CQCHHGASCDPRTGECLCAPGYTGVYCEELCPPGSHGAHCELRCPCQNGGTCHHITGECA 216 

Qy 255 CPSGWMGTVCGQPCPEGRFGKNCSQECQCHNGGTCDAATGQCHCSPGYTGERCQDECPVG 314 

II II I II I I I I I I I : I I I I : I I I : I I II I I I I II : II I : I I I : I I I I 

Db 217 CPPGWTGAVCAQPCPPGTFGQNCSQDCPCHHGGQCDHVTGQCHCTAGYMGDRCQEECPFG 276 

Qy 315 TYGVLCAETCQCVNGGKCYHVSGACLCEAGFAGERCEARLCPEGLYGIKCDKRCPCHLEN 37 4 

1 : I I I : : I I I I I : ! : I I I I I I : I I : I I I I I I I : I I ill II 
Db 277 TFGFLCSQRCDCHNGGQCSPATGACECEPGYKGPSCQERLCPEGLHGPGCTLPCPCDTEN 336 

Qy 375 THSCHPMSGECACKPGWSGLYCNETCSPGFYGEACQQICSCQNGADCDSVTGKCTCAPGF 434 

I I I I I : : I I I : I I I I I I I I I : I I : I I II I : I I I I I I I I : I I I I I I I I I 

Db 337 TISCHPVTGACTCQPGWSGHYCNESCPAGYYGNGCQLPCTCQNGADCHSITGSCTCAPGF 396 

Qy 435 KGIDCSTPCPLGTYGINCSSRCGCKNDAVCSPVDGSCTCKAGWHGVDCSIRCPSGTWGFG 4 94 

I I : I I I I I I I I I I I I I I I I I I I I I I I : I I I : I I I : I I I I I I I 
Db 397 MGEVCAVPCAAGTYGPNCSSVCSCSNGGTCSPVDGSCTCREGWQGLDCSLPCPSGTWGLN 456 

Qy 4 95 CNLTCQCLNGGACNTLDGTCTCAPGWRGEKCELPCQDGTYGLNCAERCDCSHADGCHPTT 554 

II II I II II: I I : I I III I : I I I I I I I I : I I I I : I I I I I I I I II I I 

Db 457 CNETCICANGAACSPFDGSCACTPGWLGDSCELPCPDGTFGLNCSEHCDCSHADGCDPVT 516 



Qy 555 GHCRCLPGWSGVHCDSVCAEGRWGPNCSLPC 5 85 

I I I I I I I : I : I I I I I I I I I I I I : I 

Db 517 GHCCCLAGWTGIRCDSTCPPGRWGPNCSVSC 547 



RESULT 4 
Q96KG6 

ID Q96KG6 PRELIMINARY; PRT; 969 AA. 

AC Q9 6KG6; 

DT 01-DEC-2001 (TrEMBLrel. 19, Created) 

DT 01-DEC-2001 (TrEMBLrel. 19, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE MEGF11 protein (Hypothetical protein KIAA1781). 

GN MEGF11 . 

OS Homo sapiens (Human) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 

OC Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

OX NCBI_TaxID=9 60 6; 

RN [ 1 ] 

RP SEQUENCE FROM N.A. 

RC TISSUE=Brain; 

RX MEDLINE=21245130; PubMed=11347 906 ; 

RA Nagase T., Nakayama M. , Nakajima D . , Kikuno R., Ohara 0. ; 

RT "Prediction of the coding sequences of unidentified human genes. XX. 

RT The complete sequences of 100 new cDNA clones from brain which code 

RT for large Proteins in vitro."; 

RL DNARes. 8:85-95(2001). 

DR EMBL; AB058677; BAB47410.1; -. 

DR GO; GO: 0016020; C:membrane; IEA. 

DR GO; GO: 0045285; C : ubiquinol-cytochrome-c reductase complex; IEA. 

DR GO; GO: 0005198; F: structural molecule activity; IEA. 

DR GO; GO: 0008121; F: ubiquinol-cytochrome-c reductase activity; IEA. 

DR GO; GO: 0006118; P:electron transport; IEA. 

DR InterPro; IPR006209; EGF_like. 

DR InterPro; IPR002049; Laminin_EGF. 

DR InterPro; IPR005805; Rieske. 

DR Pfam; PF00008; EGF; 12. 

DR PRINTS; PR00011; EGFLAMININ. 

DR SMART; SM0018 0; EGF_Lam; 8. 

DR PROSITE; PS00022; EGF_1 ; 17. 

DR PROSITE; PS01186; EGF_2 ; 17. 

DR PROSITE; PS00200; RIESKE_2; 1. 

KW Hypothetical protein; EGF-like domain; Laminin EGF-like domain. 

SQ SEQUENCE 969 AA; 101600 MW; 56DD2FFE13 9C8209 CRC64; 

Query Match 58.2%; Score 2094; DB 4; Length 969; 

Best Local Similarity 65.4%; Pred. No. 9.1e-192; 

Matches 312; Conservative 58; Mismatches 107; Indels 0; Gaps 0 

QY 109 CADKCVHGRCIAPNTCQCEPGWGGTNCSSACDGDHWGPHCTSRCQCKNGALCNPITGACH 168 

I = : I I I I I I : : ! : I I I I I I I I I : I I I II I I I I I I I :: I II I : I I I I I I I I I I I I 
Db 28 CTEECVHGRCVSPDTCHCEPGWGGPDCSSGCDSDHWGPHCSNRCQCQNGALCNPITGACV 87 

QY 169 CAAGFRGWRCEDRCEQGTYGNDCHQRCQCQNGATCDHVTGECRCPPGYTGAFCEDLCPPG 228 

I 1 I I I I I I I I I : I I I : I I I I I :: I I : I I I I I I I I I I I : I I : I M I I 
Db 8 8 CAAGFRGWRCEELCAPGTHGKGCQLPCQCRHGASCDPRAGECLCAPGYTGVYCEELCPPG 14 7 



Qy 22 9 KHGPQCEQRCPCQNGGVCHHVTGECSCPSGWMGTVCGQPCPEGRFGKNCSQECQCHNGGT 28 8 

I I M M I I I I I I I I I : I I I I : M I I I II I I I I I I I : I I I ! : I I I : I I 
Db 148 SHGAHCELRCPCQNGGTCHHITGECACPPGWTGAVCAQPCPPGTFGQNCSQDCPCHHGGQ 207 

QY 289 CDAATGQCHCSPGYTGERCQDECPVGTYGVLCAETCQCVNGGKCYHVSGACLCEAGFAGE 34 8 

II I I I I I I : I I I : I I I : I I I I :: I I : : I I I I I : | : I I I M I : I 
Db 208 CDHVTGQCHCTAGYMGDRCQEECPFGSFGFQCSQRCDCHNGGQCSPTTGACECEPGYKGP 267 

Qy 349 RCEARLCPEGLYGIKCDKRCPCHLENTHSCHPMSGECACKPGWSGLYCNETCSPGFYGEA 408 

I I : I I I I I I I : I I III : I I I I I I :: I I I : II I I I : I I I : I I : | | : 
Db 2 68 RCQERLCPEGLHGPGCTLPCPCDADNTISCHPVTGACTCQPGWSGHHCNESCPVGYYGDG 327 

Qy 409 CQQICSCQNGADCDSVTGKCTCAPGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVCSPVD 468 

M I : I I I I I I I I : I I I I I I I I I I I : I Nil Mil I I I I I II 
Db 328 CQLPCTCQNGADCHSITGGCTCAPGFMGEVCAVSCAAGTYGPNCSSICSCNNGGTCSPVD 387 

Qy 4 69 GSCTCKAGWHGVDCSIRCPSGTWGFGCNLTCQCLNGGACNTLDGTCTCAPGWRGEKCELP 52 8 

M I I 1 I II I : I I : : II II I II II : I I II I I : : II : I : I | | I I : I I I I 
Db 388 GSCTCKEGWQGLDCTLPCPSGTWGLNCNESCTCANGAACSPIDGSCSCTPGWLGDTCELP 447 

Qy 529 CQDGTYGLNCAERCDCSHADGCHPTTGHCRCLPGWSGVHCDSVCAEGRWGPNCSLPC 585 

I I I I : I I I I : I I I I I I I I I | | | | || | | : | : | | | | : | 

Db 448 CPDGTFGLNCSEHCDCSHADGCDPVTGHCCCLAGWTGIRCDSTCPPGRWGPNCSVSC 504 



RESULT 5 
Q8VHF4 

ID Q8VHF4 PRELIMINARY; PRT; 747 AA. 

AC Q8VHF4; 

DT 01-MAR-2002 (TrEMBLrel . 20, Created) 

DT 01-MAR-2002 (TrEMBLrel. 20, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE Jedi-736 protein. 

GN 3110045G13RIK. 

OS Mus musculus (Mouse) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

OX NCBI_TaxID=10 0 90; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=C57BL; TI SSUE=Tes tis ; 

RA Krivtsov A. V. , Zinovyeva M.V. , Hendrikx J., Visser J.W.M., 

RA Belyavsky A.V. ; 

RT "Jedi is a novel DSL and EGF-like repeat moti f -containing protein 

RT expressed on non-differentiated hematopoietic cells."; 

RL Submitted (DEC-2001) to the EMBL/GenBank/ DDB J databases. 

DR EMBL; AF461685; AAL66380.1; -. 

DR MGD; MGI: 1920432; 3 11 004 5G13Rik . 

DR GO; GO: 0005198; F: structural molecule activity; IEA. 

DR InterPro; IPR006209; EGF^like. 

DR InterPro; IPR009030; Grow_f ac_recep . 

DR InterPro; IPR002049; Laminin^EGF. 

DR Pfam; PF00008; EGF; 6. 

DR PRINTS; PR00011; EGFLAMININ. 

DR SMART; SM0018 0; EGF_Lam; 4. 

DR PROSITE; PS00022; EGF 1; 13. 



DR PROSITE; PS01186; EGF_2 ; 12. 

KW EGF-like domain; Laminin EGF-like domain. 

SQ SEQUENCE 747 AA; 78972 MW; F825F8F384D4736A CRC64; 

Query Match 50.8%; Score 1828; DB 11; Length 747 ; 

Best Local Similarity 52.0%; Pred. No. 2e-166; 

Matches 299; Conservative 59; Mismatches 211; Indels 6; Gaps 3 

Qy 14 LLLCHWIGTASPLNLEDPNVCSHWESYSVTVQESYPHPFDQI YYTSCTDILNW FKCT 70 

Ml : II I I I I I : I I I : : I : I I : I I : | | | | 

Db 7 LLLALGLRLTGTLNSNDPNVCTFWESFTTTTKESHLRPFSLLPAESCH— RPWEDPHTCA 64 

QY VI RHRVSYRTAYRHGEKTMYRRKSQCCPGFYESGEMCVPHCADKCVHGRCIAPNTCQCEPGW 130 

: I : II I I : I I I I I I II : I I I I I I : I I I I I I I I I 

Db 65 QPTWYRTVYRQWKMDSRPRLQCCRGYYESRGACVPLCAQECVTrGRCVAPNQCQCAPGW 124 

QY 131 GGTNCSSACDGDHWGPHCTSRCQCKNGALCNPITGACHCAAGFRGWRCEDRCEQGTYGND 190 

I : I I I I I I I I I I I : I : I : I I I I : | : | | | | | 

Db 125 RGGDCSSECAPGMWGPQCDKFCHCGNNSSCDPKSGACFCPSGLQPPNCLQPCPAGHYGPA 184 

Qy 191 CHQRCQCQNGATCDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPCQNGGVCHHVT 2 50 

I I M I I : I I I I M II i I I I I I : I I M I I I 
Db 185 CQFDCQCY-GASCDPQDGACFCPPGRAGPSCNVPCSQGTDGFFCPRTYPCQNGGVPQGSQ 243 

Qy 251 GECSCPSGWMGTVCGQPCPEGRFGKNCSQECQCHNGGTCDAATGQCHCSPGYTGERCQDE 310 

I I M I I I I I : I I I I I I I I I : I I I : M I I I II I I M I I : I I I I : I I I : I 
Db 244 GSCSCPPGWMGVICSLPCPEGFHGPNCTQECRCHNGGLCDRFTGQCHCAPGYIGDRCQEE 303 

QY 311 CPVGTYGVLCAETCQCVNGGKCYHVSGACLCEAGFAGERCEARLCPEGLYGIKCDKRCPC 37 0 

I I I I : I I I II I I I : I : : I I I I I I M I : I I I I I I : I I I : I : I I 
Db 304 CPVGRFGQDCAETCDCAPGARCFPANGACLCEHGFTGDRCTERLCPDGRYGLSCQEPCTC 363 

Qy 371 HLENTHSCHPMSGECACKPGWSGLYCNETCSPGFYGEACQQICSCQNGADCDSVTGKCTC 430 

I : : I I I I I I I I : I : I I I : I I : I M : I : I I I : I I : I I : : I I I 

Db 3 64 DPEHSLSCHPMHGECSCQPGWAGLHCNESCPQDTHGPGCQEHCLCLHGGLCLADSGLCRC 42 3 

Qy 431 APGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVCSPVDGSCTCKAGWHGVDCSIRCPSGT 490 

I I I : I I : M II I I I I I I I I I : I I I I : M : I II I I : I I : I I I I 
Db 424 APGYTGPHCANLCPPDTYGINCSSRCSCENAIACSPIDGTCICKEGWQRGNCSVPCPLGT 483 

Qy 4 91 WGFGCNLTCQCLNGGACNTLDGTCTCAPGWRGEKCELPCQDGTYGLNCAERCDCSHADGC 550 

M I I I : I I I : I I : I III III I hill I : I II III: 
Db 4 84 WGFNCNASCQCAHDGVCSPQTGACTCTPGWHGAHCQLPCPKGQFGEGCASVCDCDHSDGC 54 3 

Qy 551 HPTTGHCRCLPGWSGVHCDSVCAEGRWGPNCSLPC 585 

I I Ml I I I I I I I I I I I I I 
Db 54 4 DPVHGQCRCQAGWMGTRCHLPCPEGFWGANCSNTC 57 8 

RESULT 6 
Q8VHL7 



ID Q8VHL7 PRELIMINARY; PRT; 1034 AA. 

AC Q8VHL7; 

DT 01-MAR-2002 ( TrEMBLrel . 20, Created) 

DT 01-MAR-2002 (TrEMBLrel. 20, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE Jedi protein. 



GN 3110045G13RIK. 

OS Mus musculus (Mouse) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos torni ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

OX NCBI_TaxID=10 09 0; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=C57BL; TISSUE-Tes tis ; 

RA Krivtsov A.V. , Zinovyeva M.V. , Hendrikx J., Visser J.W.M., 

RA Belyavsky A.V. ; 

RT "Jedi is a novel DSL and EGF-like repeat motif-containing protein 

RT expressed on non-differentiated hematopoietic cells."; 

RL Submitted (NOV-2001) to the EMBL/GenBank/DDB J databases. 

DR EMBL; AF444274; AAL38571.1; 

DR MGD; MGI:1920432; 3110045G13Rik, 

DR GO; GO: 0005198; F: structural molecule activity; IEA. 

DR InterPro; IPR006209; EGF__like. 

DR InterPro; IPR009030; Grow_f ac_recep . 

DR InterPro; IPR002049; Laminin_EGF. 

DR Pfam; PF0 0 0 08; EGF; 6. 

DR PRINTS; PR00011; EGFLAMININ . 

DR SMART; SM0018 0; EGF_Lam; 4. 

DR PROSITE; PS00022; EGF_1 ; 13. 

DR PROSITE; PS01186; EGF_2 ; 12. 

KW EGF-like domain; Laminin EGF-like domain. 

SQ SEQUENCE 1034 AA; 110540 MW; 5514E5166AE01111 CRC64; 



Query Match 50.8%; Score 1828; DB 11; Length 1034; 

Best Local Similarity 52.0%; Pred. No. 2.8e-166; 

Matches 299; Conservative 59; Mismatches 211; Indels 6; Gaps 3 

Qy 14 LLLCHWIGTASPLNLEDPNVCSHWESYSVTVQESYPHPFDQIYYTSCTDILNW FKCT 7 0 

III : II I I I I I : I I I : : I : II : II : II | | 

Db 7 LLLALGLRLTGTLNSNDPNVCTE r WESFTTTTKESHLRPFSLLPAESCH--RPWEDPHTCA 64 

Qy 71 RHRVSYRTAYRHGEKTMYRRKSQCCPGFYESGEMCVPHCADKCVHGRCIAPNTCQCEPGW 130 

: I I I I M I I : I I I I : I I I I II I I : I I I II I : I I I II I II I 
Db 65 QPTWYRTVYRQWKMDSRPRLQCCRGYYESRGACVPLCAQECVHGRCVAPNQCQCAPGW 124 

Qy 131 GGTNCSSACDGDHWGPHCTSRCQCKNGALCNPITGACHCAAGFRGWRCEDRCEQGTYGND 190 

I : I I I I I I I I I I I : I : I : I I I I : I : I I I II 

Db 125 RGGDCSSECAPGMWGPQCDKFCHCGNNSSCDPKSGACFCPSGLQPPNCLQPCPAGHYGPA 184 

Qy 191 CHQRCQCQNGATCDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPCQNGGVCHHVT 25 0 

= I I I I I I I I I I I I I I : I I I I I I I 

Db 185 CQFDCQCY-GASCDPQDGACFCPPGRAGPSCNVPCSQGTDGFFCPRTYPCQNGGVPQGSQ 243 

Qy 251 GECSCPSGWMGTVCGQPCPEGRFGKNCSQECQCHNGGTCDAATGQCHCSPGYTGERCQDE 310 

I I I I I I I I I : I I I I I I I I I : I I I : I I I I I II I I I I I I : I I I I : I I I : I 
Db 244 GSCSCPPGWMGVICSLPCPEGFHGPNCTQECRCHNGGLCDRFTGQCHCAPGYIGDRCQEE 303 

Qy 311 CPVGTYGVLCAETCQCVNGGKCYHVSGACLCEAGFAGERCEARLCPEGLYGIKCDKRCPC 370 

1111:1 I i I I I I I : I : : I I I I I I I I I : I I I I II : I I I : I : I I 
Db 304 CPVGRFGQDCAETCDCAPGARCFPANGACLCEHGFTGDRCTERLCPDGRYGLSCQEPCTC 363 

Qy 371 HLENTHSCHPMSGECACKPGWSGLYCNETCSPGFYGEACQQICSCQNGADCDSVTGKCTC 430 

I : : I I I I I I I I : | : | M : I I : | | | : | : | I I : I I : I I : : I I I 



Db 



3 64 DPEHSLSCHPMHGECSCQPGWAGLHCNESCPQDTHGPGCQEHCLCLHGGLCLADSGLCRC 423 



Qy 4 31 APGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVCSPVDGSCTCKAGWHGVDCSIRCPSGT 490 

I I I : I I : II I I I I I I I I I I I : I I M : I I : I I I I I : I I : I I M 

Db 424 APGYTGPHCANLCPPDTYGINCSSRCSCENAIACSPIDGTCICKEGWQRGNCSVPCPLGT 483 

Qy 4 91 WGFGCNLTCQCLNGGACNTLDGTCTCAPGWRGEKCELPCQDGTYGLNCAERCDCSHADGC 550 

I I I I I : I I I : I I : I I I I I I I I I : I I I I : I I | I II I : I I I 
Db 484 WGFNCNASCQCAHDGVCSPQTGACTCTPGWHGAHCQLPCPKGQFGEGCASVCDCDHSDGC 543 

Qy 551 HPTTGHCRCLPGWSGVHCDSVCAEGRWGPNCSLPC 585 

I I I I I III I I II I I I I I I 
Db 544 DPVHGQCRCQAGWMGTRCHLPCPEGFWGANCSNTC 57 8 



RESULT 7 
Q8VIK5 
ID Q8VIK5 
Q8VIK5; 
01-MAR-2002 
01-MAR-2002 
01-OCT-2003 
MEGF12 . 

3110045G13RIK OR MEGF12. 
Mus musculus (Mouse) . 
Eukaryota; Metazoa; Chordata; 
Mammalia; Eutheria; 
NCBI TaxID=10090; 



AC 
DT 
DT 
DT 
DE 
GN 
OS 
OC 
OC 
OX 
RN 
RP 
RC 
RA 
RT 
RT 
RL 
RN 
RP 
RC 
RX 
RA 
RA 
RT 
RT 
RL 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
KW 



PRELIMINARY; 

(TrEMBLrel . 
(TrEMBLrel. 
(TrEMBLrel. 



20, 
20, 
25, 



PRT; 1034 AA. 
Created) 

Last sequence update) 
Last annotation update) 



Rodentia ; 



Craniata ; Vertebra ta ; Euteleos tomi ; 
Sciurognathi; Muridae; Murinae; Mus. 



[1] 

SEQUENCE FROM N . A. 
STRAIN-C57BL/ 6; TISSUE=Liver; 
Ivanova N.B., Lemischka I.R.; 

"The global gene expression profiling of the hematopoietic stem 
cell."; 

Submitted (OCT-2001) to the EMBL/ GenBank/DDBJ databases. 
[2] 

SEQUENCE FROM N . A. 
STRAIN=C57BL/6J; TISSUE-Eye; 
MEDLINE-22354 683; PubMed=12 4 66851 ; 
The FANTOM Consortium, 

the RIKEN Genome Exploration Research Group Phase I & II Team; 

"Analysis of the mouse trans criptome based on functional annotation of 

60,770 full-length cDNAs . " ; 

Nature 42 0:563-57 3(2 002). 

EMBL; AF440279; AAL33583.1; -. 

EMBL; AK053551; BAC35426.1; -. 

MGD; MGI: 1920432; 3110045G13Rik . 

GO; GO: 0005198; F: structural molecule activity; IEA. 

InterPro; IPR006209; EGF_like. 

InterPro; IPR009030; Grow^f ac_recep . 

InterPro; IPR002049; Laminin_EGF. 

Pfam; PF00008; EGF; 6. 

PRINTS; PR00011; EGFLAMININ. 

SMART; SM00180; EGF_Lam; 4. 

PROSITE; PS00022; EGF_1 ; 13. 

PROSITE; PS01186; EGF_2 ; 12. 

EGF-like domain; Laminin EGF-like domain. 



SQ SEQUENCE 1034 AA; 110580 MW; 714E501684 8E4E4C CRC64; 



Query Match 50.7%; Score 1824; DB 11 

Best Local Similarity 51.8%; Pred. No. 6.7e-166 
Matches 298; Conservative 59; Mismatches 212 



Length 1034; 

Indels 6; Gaps 3 



Qy 

Db 

Qy 

Db 

Qy 

Db 



14 LLLCHWIGTASPLNLEDPNVCSHWESYSVTVQESYPHPFDQI YYTSCTDILNW FKCT 7 0 

I M : II I I I I I : I I I : : I : I I : II : | | | | 

7 LLLALGLRLTGTLNSNDPNVCTFWESFTTTTKESHLRPFSLLPAESCH--RPWEDPHTCA 64 

71 RHRVSYRTAYRHGEKTMYRRKSQCCPGFYESGEMCVPHCADKCVHGRCIAPNTCQCEPGW 13 0 

: I I I I I I I I : I II I : I I I I I I II : I I I I I I : I I I M I I M 

65 QPTWYRTVYRQWKMDSRPRLQCCRGYYESRGACVPLCAQECVHGRCVAPNQCQCAPGW 12 4 

131 GGTNCSSACDGDHWGPHCTSRCQCKNGALCNPITGACHCAAGFRGWRCEDRCEQGTYGND 19 0 

I : I I I I IN I I I I : I : I : I I I : I : I | | | | 

12 5 RGGDCSSECAPGMWGPQCDKFCHCGNNSSCDPKSGTCFCPSGLQPPNCLQPCPAGHYGPA 18 4 



Qy 191 CHQRCQCQNGATCDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPCQNGGVCHHVT 250 

I Ml I I : II I I I II I I I I I | | : M I II I I 
Db 185 CQFDCQCY-GASCDPQDGACFCPPGRAGPSCNVPCSQGTDGFFCPRTYPCQNGGVPQGSQ 243 

Qy 251 GECSCPSGWMGTVCGQPCPEGRFGKNCSQECQCHNGGTCDAATGQCHCSPGYTGERCQDE 310 

I I I I I I I I I : I I I I | I I 11:111:11111 II I I II I I : I II I : I II : I 
Db 244 GSCSCPPGWMGVICSLPCPEGFHGPNCTQECRCHNGGLCDRFTGQCHCAPGYIGDRCQEE 303 

Qy 311 CPVGTYGVLCAETCQCVNGGKCYHVS GACLCEAGFAGERCEARLCPEGLYGI KCDKRCPC 37 0 

I I I I : I I II I I I I : I : : I ! I I I I M I : I I I I M : I I I : I : I I 
Db 304 CPVGRFGQDCAETCDCAPGARCFPANGACLCEHGFTGDRCTERLCPDGRYGLSCQEPCTC 363 

Qy 371 HLENTHSCHPMSGECACKPGWSGLYCNETCSPGFYGEACQQICSCQNGADCDSVTGKCTC 430 

I I I I I I I I I : I : I I I : I I : I I I : I : I I I : I I : I I : : I I I 

Db 364 DPEHSLSCHPMHGECSCQPGWAGLHCNESCPQDTHGPGCQEHCLCLHGGLCLADSGLCRC 423 

Qy 431 APGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVCSPVDGSCTCKAGWHGVDCSIRCPSGT 4 90 

IN: I I : II I I II I II I II I : I I II : I I : I I I | | : I I : I I I I 
Db 424 APGYTGPHCANLCPPDTYGINCSSRCSCENAIACSPIDGTCICKEGWQRGNCSVPCPLGT 4 83 

Qy 491 WGFGCNLTCQCLNGGACNTLDGTCTCAPGWRGEKCELPCQDGTYGLNCAERCDCSHADGC 550 

I I I I I : I I I : I I : I I I I I I I I hill I : I I I Nihil 
Db 484 WGFNCNASCQCAHDGVCSPQTGACTCTPGWHGAHCQLPCPKGQFGEGCASVCDCDHSDGC 543 



Qy 551 HPTTGHCRCLPGWSGVHCDSVCAEGRWGPNCSLPC 585 

I I I I I I I I I I I I I I I I I I 
Db 54 4 DPVHGQCRCQAGWMGTRCHLPCPEGFWGANCSNTC 57 8 



RESULT 8 
Q8CGA7 

ID Q8CGA7 PRELIMINARY; PRT; 1004 AA. 

AC Q8CGA7; 

DT 01-MAR-2003 (TrEMBLrel. 23, Created) 

DT 01-MAR-2003 (TrEMBLrel. 23, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE Similar to RIKEN cDNA 3110045G13 gene. 

GN 3110045G13RIK. 

OS Mus musculus (Mouse) . 



OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus . 

OX NCBI_TaxID=100 90; 

RN [1] 

RP SEQUENCE FROM N . A. 

RC STRAIN=FVB/N; 

RA Strausberg R. ; 

RL Submitted (JAN-2003) to the EMBL/ GenBank/DDBJ databases. 

DR EMBL; BC042490; AAH42490.1; 

DR MGD; MGI:1920432; 3110045Gl3Rik . 

DR GO; GO: 0005198; F: structural molecule activity; IEA. 

DR InterPro; IPR006209; EGF_like. 

DR InterPro; IPR009030; Grow^f ac_recep . 

DR InterPro; IPR006210; IEGF . 

DR InterPro; IPR002049; Laminin^EGF. 

DR Pfam; PF00008; EGF; 6. 

DR PRINTS; PR00011; EGFLAMININ. 

DR SMART; SM00181; EGF; 14. 

DR SMART; SM0 018 0; EGF_Lam; 12. 

DR PROSITE; PS00022; EGF_1 ; 12. 

DR PROSITE; PS01186; EGF_2 ; 11. 

SQ SEQUENCE 1004 AA; 107377 MW; 9508B0EC04561E94 CRC64; 

Query Match 46.9%; Score 1690; DB 11; Length 1004 ; 

Best Local Similarity 49.2%; Pred. No. 4.4e-153; 

Matches 283; Conservative 52; Mismatches 204; Indels 36; Gaps 4 

QY 14 LLLCHWIGTASPLNLEDPNVCSHWESYSVTVQESYPHPFDQI YYTSCTDILNW FKCT 7 0 

Ml : II I I I I I : I I I : : I : I | : | | : | | | | 

Db 7 LLLALGLRLTGTLNSNDPNVCTFWESFTTTTKESHLRPFSLLPAESCH— RPWEDPHTCA 64 

QY 71 RHRVSYRTAYRHGEKTMYRRKSQCCPGFYESGEMCVPHCADKCVHGRCIAPNTCQCEPGW 13 0 

• I M I II I I : I I I I : I I I I I I II : I I I I I I : I I I I I 1 [ | | 
Db 65 QPTWYRTVYRQWKMDSRPRLQCCRGYYESRGACVPLCAQECVHGRCVAPNQCQCAPGW 12 4 

QY 131 GGTNCSSACDGDHWGPHCTSRCQCKNGALCNPITGACHCAAGFRGWRCEDRCEQGTYGND 190 

I : I I I I I I I I I I I : I : I : I I I I : I : I | | | | 

Db 125 RGGDCSSECAPGMWGPQCDKFCHCGNNSSCDPKSGACFCPSGLQPPNCLQPCPAGHYGPA 184 

Qy 191 CHQRCQCQNGATCDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPCQNGGVCHHVT 2 50 

! Ml M : M I I MM I I I I I | : II I II II 
Db 185 CQFDCQCY-GASCDPQDGACFCPPGRAGPSCNVPCSQGTDGFFCPRTYPCQNGGVPQGSQ 243 

Qy 251 GECSCPSGWMGTVCGQPCPEGRFGKNCSQECQCHNGGTCDAATGQCHCSPGYTGERCQDE 310 



Db 



24 4 GSCSCPPGWMGVICSLPCPEGFHGPNCTQECRCHNGGLCDRFTGQCHCAPGYIGDRCQEE 303 




Qy 



311 CPVGTYGVLCAETCQCVNGGKCYHVS GACLCEAGFAGERCEARLCPEGLYGIKCDKRCPC 37 0 



Db 



1111 • 1 1 • i • • i j i i i i i i iMi iiii'i i i • i . i i 

30 4 CPVGRFGQDCAETCDCAPGARCFPANGACLCEHGFTGDRCTERLCPDGRYGLSCQEPCTC 3 63 



Qy 



371 HLENTHSCHPMSGECACKPGWSGLYCNETCSPGFYGEACQQICSCQNGADCDSVTGKCTC 43 0 



Db 



1 ■ • i i - i i i i i 

364 DPEHSLSCHPMH CLCLHGGLCLADSGLCRC 393 



Qy 



4 31 APGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVCSPVDGSCTCKAGWHGVDCSIRCPSGT 4 90 
I II : I M I I II I II II M I I : I I I I : I I : I II II MM II II 



Db 394 APGYTGPHCANLCPPDTYGINCSSRCSCENAIACSPIDGTCICKEGWQRGNCSVPCPLGT 453 

QY 4 91 WGFGCNLTCQCLNGGACNTLDGTCTCAPGWRGEKCELPCQDGTYGLNCAERCDCSHADGC 55 0 

I I I I I : I I I : I I : I I I I I II I hill I : I | | I I I I : I I I 
Db 454 WGFNCNASCQCAHDGVCSPQTGACTCTPGWHGAHCQLPCPKGQFGEGCASVCDCDHSDGC 513 

QY 551 HPTTGHCRCLPGWSGVHCDSVCAEGRWGPNCSLPC 585 

I I I I I III I I I I I I I I I I 
Db 514 DPVHGQCRCQAGWMGTRCHLPCPEGFWGANCSNTC 54 8 



RESULT 
Q80T91 



ID 

AC 

DT 

DT 

DT 

DE 

GN 

OS 

OC 

OC 

OX 

RN 

RP 

RC 

RX 

RA 

RA 

RT 

RT 

RT 

RT 

RL 

DR 

DR 

DR 

DR 

DR 

DR 

DR 

DR 

DR 

DR 

DR 

DR 

DR 

DR 

DR 

DR 

DR 

FT 

SQ 



Q80T91 
Q80T91; 
01-JUN-2003 
01-JUN-2003 
01-OCT-2003 



PRELIMINARY; 



PRT; 



921 AA. 



Created) 

Last sequence update) 
Last annotation update) 



Craniata; Vertebrata; Euteleos tomi ; 
Sciurognathi; Muridae; Murinae; Mus . 



(TrEMBLrel. 24, 
( TrEMBLrel . 24, 
(TrEMBLrel. 25, 
MKIAA1781 protein (Fragment) . 
MKIAA17 81 . 

Mus musculus (Mouse) . 
Eukaryota; Metazoa; Chordata; 
Mammalia; Eutheria; Rodentia; 
NCBI__TaxID=10090; 
[1] 

SEQUENCE FROM N.A. 
TISSUE=Brain; 

MEDLINE=22 5792 91; PubMed=12 693553 ; 
Okazaki N . , Kikuno R. , Ohara R., Inamoto S., 
Nakajima D., Nagase T., Ohara 0., Koga H.; 
"Prediction of the coding sequences of mouse 
II. The complete nucleotide sequences of 400 

cDNAs identified by screening of terminal sequences of cDNA clones 
randomly sampled from size-fractionated libraries."; 
DNA Res. 10:35-4 8(2 003) . 
EMBL; AK122555; BAC65837.1; 

C: membrane; IEA. 

C:ubiquinol-cytochrome-c reductase complex; IEA. 
F: structural molecule activity; IEA. 
F: ubiquinol-cytochrome-c reductase activity; IEA. 
P: electron transport; IEA. 
IPR0062 09; EGF_like. 
IPR006210; IEGF . 



Aizawa H . f Yuasa S., 

homologues of KIAA gene: 
mouse KIAA-homologous 



GO; 
GO; 
GO; 
GO; 
GO; 



GO: 0016020; 
GO: 0045285; 
GO: 0005198; 
GO: 0008121; 
GO: 0006118; 
InterPro; 
InterPro; 



InterPro; IPR002049; Laminin_EGF. 
InterPro; IPR005805; Rieske. 
Pfam; PF00008; EGF; 10. 
PRINTS; PR00011; EGFLAMININ. 
SMART; SM00181; EGF; 14. 
SMART; SM0 018 0; EGF_Lam; 14. 
PROSITE; PS00022; EGF_1 ; 14. 
PROSITE; PS01186; EGF_2 ; 14. 
PROSITE; PS00200; RIESKE__2; 1. 
NONJTER 1 1 

SEQUENCE 921 AA; 97316 MW; 60A34D9513A600F7 CRC64; 



Query Match 43.7%; Score 1575; DB 11; Length 921; 

Best Local Similarity 65.7%; Pred. No. 4.1e-142; 



Matches 237; Conservative 35; Mismatches 89; Indels 



0; Gaps 0 



Qy 


225 


CPPGKHGPQCEQRCPCQNGGVCHHVTGECSCPSGWMGTVCGQPCPEGRFGKNCSQECQCH 2 84 

1 1 1 1 II I 1 1 1 1 1 II : 1 1 1 1 : 1 1 II 1 || MM I || : M 1 M 1 II 

CPPGSHGAHCELRCPCQNGGTCHHITGECACPPGWTGAVCAQPCPPGTFGQNCSQDCPCH 60 


Db 


1 


Qy 


285 


NGGTCDAATGQCHCSPGYTGERCQDECPVGTYGVLCAETCQCVNGGKCYHVSGACLCEAG 
•IMI II 1 1 1 1 : II 1 : M 1 : 1 M 1 1 : 1 II : : | I | | | : | MM M 1 
HGGQCDHVTGQCHCTAGYMGDRCQEECPFGTFGFLCSQRCDCHNGGQCSPATGACECEPG 


344 


Db 


61 


120 


Qy 


345 


FAGERCEARLCPEGLYGIKCDKRCPCHLENTHSCHPMSGECACKPGWSGLYCNETCSPGF 
: 1 1:1111111:1 1 M 1 1 M II M : : I 1 1 : 1 I I M II 1 1 M 1 : 
YKGPSCQERLCPEGLHGPGCTLPCPCDTENTISCHPVTGACTCQPGWSGHYCNESCPAGY 


404 


Db 


121 


180 


Qy 


405 


YGEACQQICSCQNGADCDSVTGKCTCAPGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVC 
M II 1 : M 1 1 M 1 IMI 1 1 1 M M 1 MM 1 M 1 II II 1 1 1 I 
YGNGCQLPCTCQNGADCHSITGSCTCAPGFMGEVCAVPCAAGTYGPNCSSVCSCSNGGTC 


464 


Db 


181 


240 


Qy 


465 


SPVDGSCTCKAGWHGVDCS1RCPSGTWGFGCNLTCQCLNGGACNTLDGTCTCAPGWRGEK 
1 M 1 II 1 M : 1 1 1 M M : 1 II 1 1 II 1 1 1 1 1 I | M : 1 1 M 1 1 1 1 1 : 
SPVDGSCTCREGWQGLDCSLPCPSGTWGLNCNETCICANGAACSPFDGSCACTPGWLGDS 


524 


Db 


241 


300 


Qy 


525 


CELPCQDGTYGLNCAERCDCSHADGCHPTTGHCRCLPGWSGVHCDSVCAEGRWGPNCSLP 

MIM II 1 M 1 II M 1 II M M II 1 1 II 1 II MM: Ml 1 1 1 1 II I I I : 

CELPCPDGTFGLNCSEHCDCSHADGCDPVTGHCCCLAGWTGIRCDSTCPPGRWGPNCSVS 


584 


Db 


301 


360 


Qy 


585 


C 585 




Db 


361 


1 

C 361 





RESULT 
088281 
ID 
AC 
DT 
DT 
DT 
DE 
GN 
OS 
OC 



10 



PRELIMINARY; 



OC 
OX 
RN 
RP 
RC 
RX 
RA 
RT 
RT 
RL 
DR 
DR 
DR 
DR 
DR 



PRT; 1574 AA. 
Created) 

Last sequence update) 
Last annotation update) 



Crania ta; Vertebrata; Euteleostomi ; 
Sciurognathi; Muridae; Murinae; Rattus , 



088281 
088281; 

01-NOV-1998 (TrEMBLrel. 08, 
01-NOV-1998 (TrEMBLrel. 08, 
01-OCT-2003 (TrEMBLrel. 25, 
MEGF6 . 
MEGF6. 

Rattus norvegicus (Rat) . 
Eukaryota; Metazoa; Chordata; 
Mammalia; Eutheria; Rodentia; 
NCBIJTaxID=10116; 
[1] 

SEQUENCE FROM N.A. 

STRAIN=Sprague-Dawley; TISSUE=Brain; 
MEDLINE=9 8 36008 9; PubMed-9 693 03 0 ; 

Nakayama M . f Nakajima D. f Nagase T., Nomura N . , Seki N . f Ohara O.; 

"Identification of high-molecular-weight proteins with multiple EGF- 

like motifs by motif-trap screening."; 

Genomics 51:27-34(1998). 

EMBL ; AB011532; BAA32462.1; 

PIR; T13954; T13954. 

HSSP; P00736; 1APQ . 

GO; GO: 0005509; F: calcium ion binding; IEA. 

GO; GO: 0005198; F: structural molecule activity; IEA. 



DR InterPro; IPR000152; Asx_hydroxyl_S . 

DR InterPro; IPR001881; EGF_Ca. 

DR InterPro; IPR006209; EGF^like. 

DR InterPro; IPR002049; Laminin^EGF . 

DR Pfam; PF00008; EGF; 20. 

DR PRINTS; PR00011; EGFLAMININ. 

DR SMART; SM0017 9; EGF_CA; 4. 

DR PROSITE; PS00010; ASX_HYDROXYL ; 5. 

DR PROSITE; PS00022; EGF_1 ; 23. 

DR PROSITE; PS01186; EGF_2 ; 23. 

DR PROSITE; PS01187; EGF_CA; 5. 

KW EGF-like domain. 

SQ SEQUENCE 1574 AA; 165445 MW; 2B4 8 533D8 F7 7 F6E7 CRC64; 



Query Match 38.1%; Score 1372.5; DB 11; Length 1574; 

Best Local Similarity 42.2%; Pred. No. 1.7e-122; 

Matches 230; Conservative 60; Mismatches 199; Indels 56; Gaps 11; 

QY 89 RRKSQCCPGFYESGEMCVPHCADKCVH-GRCIAPNT--CQCEPGWGGTNCSSACDGDHWG 145 

I : I I M I : I I II I I III I :| Mi Ml 

Db 816 RCQDTCSAGWYGTG— CQIRCA— CANDGHC-DPTTGRCSCAPGWTGLSCQRACDSGHWG 870 

QY 14 6 PHCTSRCQCKNG-ALCNPITGACHCAAGFRGWRCEDRCEQGTYGNDCHQRCQCQNGATCD 2 04 

II II I I : : : I I I II : I M I I I I II I I : I : I : : I I I I 

Db 871 PDCIHPCNCSAGHGNCDAVSGLCLCEAGYEGPRCEQSCRQGYYGPSCEQKCRCEHGAACD 930 

Qy 205 HVTGECRCPPGYTGAFCED LCPPGKHGPQC 234 

MM I II i: 1:111 Ml I: MM 

Db 931 HVSGACTCPAGWRGSFCEHACPAGFFGLDCDSACNCSAGAPCDAVTGSCICPAGRWGPRC 990 

Qy 235 EQRCP CQNGGVCHHVTGECSCPSGWMGTVCGQPCPEGRFGKNCSQEC 281 

IN III I II I M I MM I I I I I M I I I I 

Db 991 AQSCPPLTFGLNCSQICTCFNGASCDSVTGQCHCAPGWMGPTCLQACPPGLYGKNCQHSC 1050 

QY 282 QCHNGGTCDAATGQCHCSPGYTGERCQDECPVGTYGVLCAETCQCVNGGKCYHVSGACLC 341 

I M I II II I I Mil I : M I II I I I : M I I : M II I 
Db 1051 LCRNGGRCDPILGQCTCPEGWTGLACENECLPGHYAAGCQLNCSCLHGGICDRLTGHCLC 1110 

Qy 342 EAGFAGERCEARLCPEGLYGIKCDKRCPCHLENTHSCHPMSGECACKPGWSGLYCNETCS 4 01 

I I : I : : I : : I I : I : | : : | | III : : I I I III I M : I 

Db 1111 PAGWTGDKCQSS-CVSGTFGVHCEEHCAC— RKGASCHHVTGACFCPPGWRGPHCEQACP 1167 

QY 4 02 PGFYGEACQQICSCQNGADCDSVTGKCTCAPGFKGIDCSTPCPLGTYGINCSSRCGCKND 4 61 

I : : I I I I I I I M I I I M I I II I : I I I I M M II : 
Db 1168 RGWFGEACAQRCLCPTNASCHHVTGECRCPPGFTGLSCEQACQPGTFGKDCEHLCQCPGE 1227 

Qy 4 62 A-VCSPVDGSCTCKAGWHGVDCSIRCPSGTWGFGCNLTCQCLNGGACNTLDGTCTCAPGW 52 0 

II I M I M M I I II I M M M I M M II I : III I : 
Db 1228 TWACDPAS GVCTCAAGYHGTGCLQRCPSGRYGPGCEHICKCLNGGTCDPATGACYCPAGF 1287 

QY 521 RGEKCELPCQDGTYGLNCAERCDCSHADGCHPTTGHCRCLPGWSGVIiCDSVCAEGRWGPN 58 0 

I I I I I : I Ml II I I M I I I I M I I : I : I M 

Db 1288 LGADCSLACPQGRFGPSCAHVCACRQGAACDPVSGACICSPGKTGVRCEHGCPQDRFGKG 1347 

Qy 581 CSLPC 585 

I I I 

Db 1348 CELKC 1352 



RESULT 11 
Q8ND91 

ID Q8ND91 PRELIMINARY; PRT; 62 6 AA. 

AC Q8ND91; 

DT 01-OCT-2002 (TrEMBLrel. 22, Created) 

DT 01-OCT-2002 (TrEMBLrel . 22, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE Hypothetical protein (Fragment) . 

GN DKFZP434L121. 

OS Homo sapiens (Human) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

OX NCBI_TaxID=9606; 

RN [1] 

RP SEQUENCE FROM N . A. 

RC TISSUE=Testis; 

RA Poustka A. T Wellenreuther R. , Mewes H.W., Weil B., Wiemann S.; 

RL Submitted (JUL-2002) to the EMBL/ GenBank/DDBJ databases. 

DR EMBL; AL834326; CAD38994.1; 

DR GO; GO: 0016020; C:membrane; IEA. 

DR GO; GO: 0045285; C : ubiquinol-cytochrorne-c reductase complex; IEA. 

DR GO; GO: 0005198; F: structural molecule activity; IEA. 

DR GO; GO: 0008121; F : ubiquinol-cytochrome-c reductase activity; IEA. 

DR GO; GO: 0006118; P:electron transport; IEA. 

DR InterPro; IPR006209; EGF_like. 

DR InterPro; IPR006210; IEGF. 

DR InterPro; IPR002049; Laminin_EGF. 

DR InterPro; IPR005805; Rieske. 

DR Pfam; PF00008; EGF; 7. 

DR PRINTS; PR00011; EGFLAMININ. 

DR SMART; SM00181; EGF; 11. 

DR SMART; SM0018 0; EGF_Lam; 11. 

DR PROSITE; PS00022; EGF_1 ; 11. 

DR PROSITE; PS01186; EGF_2 ; 11. 

DR PROSITE; PS00200; RIESKE_2; 1. 

KW Hypothetical protein; EGF-like domain; Laminin EGF-like domain. 

FT NON TER 1 1 



SQ 


SEQUENCE 


626 AA; 64059 MW; CI 66FE1BD2A9 4 9F9 CRC64; 




Query Match 37.8%; Score 1362; DB 4; Length 626; 
Best Local Similarity 63.1%; Pred. No. 6.8e-122; 

Matches 205; Conservative 40; Mismatches 80; Indels 0; Gaps 


0 


Qy 


261 


GTVCGQPCPEGRFGKNCSQECQCHNGGTCDAATGQCHCSPGYTGERCQDECPVGTYGVLC 
1 1 1 M 1 1 1 1 1 : 1 M | : | M : | | | | II 1 1 1 1 : 1 1 I : || | : | | | | : : | | 
GAVCAQPCPPGTFGQNCSQDCPCHHGGQCDHVTGQCHCTAGYMGDRCQEECPFGSFGFQC 


320 


Db 


6 


65 


Qy 


321 


AETCQCVNGGKCYHVS GACLCEAGFAGERCEARLCPEGLYGI KCDKRCPCHLENTHSCHP 
: : 1 1 1 1 1 : 1 : 1 1 1 1 1 1 : 1 I I : I | M 1 1 1 : 1 1 I | | : | | Ml! 
SQRCDCHNGGQCSPTTGACECEPGYKGPRCQERLCPEGLHGPGCTLPCPCDADNTISCHP 


380 


Db 


66 


125 


Qy 


381 


MSGECACKPGWSGLYCNETCSPGFYGEACQQICSCQNGADCDSVTGKCTCAPGFKGIDCS 

: : 1 1 1 : 1 1 1 1 1 : 1 1 I : | : : | : 1 1 1 : 1 1 1 I M M 1 1 1 : 

VTGACTCQPGWSGHHCNESCPVGYYGDGCQLPCTCQNGADCHSITGGCTCAPGFMGEVCA 


440 


Db 


126 


185 



Qy 


441 


Db 


186 


Qy 


501 


Db 


246 


Qy 


561 


Db 


306 



Ml I I I I I I I I I I I I I I I ! I | || | : | | :: | | || I I I 



CLNGGACNTLDGTCTCAPGWRGEKCELPCQDGTYGLNCAERCDCSHADGCHPTTGHCRCL 560 
I II II* : I I : I : I III I : I | | I I I I I : I I I I : I I I I I I I I I I I I | | I II 
CANGAACSPIDGSCSCTPGWLGDTCELPCPDGTFGLNCSEHCDCSHADGCDPVTGHCCCL 305 



I I : I : I I I I I I I I I I I 



RESULT 12 
Q8BX64 

ID Q8BX64 PRELIMINARY; PRT; 299 AA. 

AC Q8BX64; 

DT 01-MAR-2003 (TrEMBLrel. 23, Created) 

DT 01-MAR-2003 (TrEMBLrel . 23, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE MEGF11 protein. 

GN 2410080H04RIK. 

OS Mus musculus (Mouse) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus 

OX NCBI_TaxID=10090; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=C57BL/6J; TISSUE=Cerebellum; 

RX MEDLINE=22354683; PubMed-12 4 668 5 1 ; 

RA The FAN TOM Consortium, 

RA the RIKEN Genome Exploration Research Group Phase I & II Team; 

RT "Analysis of the mouse trans criptome based on functional annotation of 

RT 60,770 full-length cDNAs."; 

RL Nature 420:563-573(2002). 

DR EMBL; AK048840; BAC33471.1; -. 

DR MGD; MGI:1920951; 24 10080H04Rik . 

DR GO; GO: 0016020; C:membrane; IEA. 

DR GO; GO: 0045285; C : ubiquinol-cytochrome-c reductase complex; IEA. 

DR GO; GO: 0005198; F: structural molecule activity; IEA. 

DR GO; GO: 0008121; F : ubiquinol-cytochrome-c reductase activity; IEA. 

DR GO; GO:0006118; P:electron transport; IEA. 

DR InterPro; IPR006209; EGF^like. 

DR InterPro; IPR006210; IEGF. 

DR InterPro; IPR002049; Laminin_EGF. 

DR InterPro; IPR005805; Rieske. 

DR Pfam; PF00008; EGF; 5. 

DR PRINTS; PR00011; EGFLAMININ. 

DR SMART; SM00181; EGF; 5. 

DR SMART; SM0018 0; EGF_Lam; 3. 

DR PROSITE; PS00022; EGF_1 ; 4. 

DR PROSITE; PS01186; EGF_2 ; 4. 

DR PROSITE; PS00200; RIESKE_2; 1. 

SQ SEQUENCE 299 AA; 32479 MW; B5F27B185AE13D1A CRC64; 



Query Match 37.2%; Score 1341; DB 11; Length 299; 

Best Local Similarity 70.2%; Pred. No. 3.3e-120; 



Matches 205; Conservative 34; Mismatches 53; Indels 0; Gaps 0 



QY 15 LLCHWIGTASPLNLEDPNVCSHWESYSVTVQESYPHPFDQIYYTSCTDILNWFKCTRHRV 7 4 

II = I I I I I M I I I I I I I I : I I I | | | | | | | M I I I I I M I I I I I I I I I I : 

Db 8 LLVFLLQAALALNPEDPNVCSHWESYAVTVQESYAHPFDQIYYTRCADILNWFKCTRHRI 67 

75 SYRTAYRHGEKTMYRRKSQCCPGFYESGEMCVPHCADKCVHGRCIAPNTCQCEPGWGGTN 134 

I I : I I I I I : I I I I I : I I I I I I : I I : I : | : | | : : | : | | | | : : | : | | MINI : 

Db 68 SYKTAYRRGLRTMYRRRSQCCPGYYENGDFCIPLCTEECMHGRCVSPDTCHCEPGWGGPD 12 7 

Qy 135 CSSACDGDHWGPHCTSRCQCKNGALCNPITGACHCAAGFRGWRCEDRCEQGTYGNDCHQR 194 

IN II : I I I I I I :: I I I I : I I I I I I M I I I I II I I I I I || I : I I I : I I 
Db 128 CSSGCDSEHWGPHCSNRCQCQNGALCNPITGACVCAPGFRGWRCEELCAPGTHGKGCQLL 187 

QY 1^5 CQCQNGATCDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPCQNGGVCHHVTGECS 254 

Ml : I I • I I I I I I I I I I I I : I I : I I I I I || || I | | | | | M |||:|||): 
Db 18 8 CQCHHGASCDPRTGECLCAPGYTGVYCEELCPPGSHGAHCELRCPCQNGGTCHHITGECA 247 

Qy 255 CPSGWMGTVCGQPCPEGRFGKNCSQECQCHNGGTCDAATGQCHCSPGYTGER 306 

II II I M I I I I I I I : I I I I : I I I : I I II I I I I I I : | | | : | 

Db 24 8 CPPGWTGAVCAQPCPPGTFGQNCSQDCPCHHGGQCDHVT'GQCHCTAGYMGDR 2 99 

RESULT 13 
Q9TVQ2 



ID Q9TVQ2 PRELIMINARY; PRT; 1664 AA. 

AC Q9TVQ2 ; 

DT 01-MAY-2000 (TrEMBLrel. 13, Created) 

DT 01-MAY-2000 (TrEMBLrel. 13, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE Y64G10A.7 protein, 

GN Y64G10A.7. 

OS Caenorhabditis elegans . 

OC Eukaryota; Metazoa; Nematoda; Chromadorea; Rhabditida; Rhabditoidea ; 

OC Rhabditidae; Peloderinae; Caenorhabditis. 

OX NCBI_TaxID=6239; 

RN [1] 

RP SEQUENCE FROM N.A. 

RA Mortimore B.J.; 

RL Submitted (APR-1999) to the EMBL/ GenBank/DDBJ databases. 

RN [2] 

RP SEQUENCE FROM N.A. 

RX MEDLINE-99069 613; PubMed-98 51916 ; 

RA none; 

RT "Genome sequence of the nematode C. elegans: A platform for 

RT investigating biology."; 

RL Science 282:2012-2018(1998). 

RN [3] 

RP SEQUENCE FROM N.A. 

RA Ains cough R. ; 



RL Submitted (MAY-1999) to the EMBL/ GenBank/ DDB J databases. 

DR EMBL; AL117206; CAB60454.1; -. 

DR EMBL; AL110498; CAB60454.1; JOINED. 

DR EMBL; AL110498; CAB57911.1; 

DR EMBL; AL117206; CAB57911.1; JOINED. 

DR HSSP; P00736; 1APQ. 

DR WormPep; Y64G10A.7; CE24549. 



DR GO; GO: 0005509; F: calcium ion binding; IEA. 

DR GO; GO: 0005198; F: structural molecule activity; IEA. 

DR InterPro; IPR000152; Asx_hydroxyl_S . 

DR InterPro; IPR001881; EGF_Ca . 

DR InterPro; IPR006209; EGF_like. 

DR InterPro; IPR002049; Laminin_EGF. 

DR Pfam; PF00008; EGF; 22. 

DR PRINTS; PR00011; EGFLAMININ. 

DR SMART; SM00179; EGF_CA; 4. 

DR PROSITE; PS00010; ASXJiYDROXYL ; 4. 

DR PROSITE; PS00022; EGF_1 ; 22. 

DR PROSITE; PS01186; EGF_2 ; 24. 

DR PROSITE; PS01187; EGF^CA; 3. 

KW EGF-like domain. 

SQ SEQUENCE 1664 AA; 179279 MW; A69F0 93B4C7 05 8 32 CRC64 ; 

Query Match 37.2%; Score 1340; DB 5; Length 1664; 

Best Local Similarity 36.4%; Pred. No. 2.3e-119; 

Matches 237; Conservative 70; Mismatches 218; Indels 126; Gaps 13; 

16 LCHWI GTASPL NLEDPNVCSHWESYSVTVQES YPHPFDQI YYTSC 60 

•II: III : | 

Db 824 VCHHVTGTCTCLPGKTGPLCDQSCAPNTYGPN-CAH TC 860 

QY 61 TDILNWFKCTRHRVSYRTAYRHGEKTMYRRKSQCCPGFYESGEMCVPHCAD 111 

: : I I I ! I I I I I I I I 

Db 861 S-CVNGAKCDESDGS CHCTPGFY — GATCSEVCPTGRFGIDCMQ 901 

Qy 112 — KCVHGR-CIAPN-TCQCEPGWGGTNCSSACDGDHWGPHCTSRCQCKNGALCNPITGAC 167 

M : I I I : I : I I II I I | | : | |: :| | :| | : | | | 

Db 902 LCKCQNGAICDTSNGSCECAPGWSGKKCDKACAPGTFGKDCSKKCDCADGMHCDPSDGEC 961 

Qy 168 HCAAGFRGWRCEDRCEQGTYGNDCHQRCQCQNGAT 2 02 

I I : I : I :: I : | : | | I MINI 
Db 962 ICPPGKKGHKCDETCDSGLFGAGCKGICSCQNGATCDSVTGSCECRPGWRGKKCDRPCPD 1021 

Qy 203 CDHVTGECRCPPGYTGAFCEDLCPPGKHGPQC 234 

I I I I I I I I I I I I : I I I : I I | : | j | 
Db 1022 GRFGEGCNAICDCTTTNDTSMYNPFVARCDHVTGECRCPAGWTGPDCQTSCPLGRHGEGC 1081 

Qy 235 EQRCPCQNGGVCHHVTGECSCPSGWMGTVCGQPCPEGRFGKNCSQECQCHNGGTCDAATG 2 94 

I I I I i 11111111:11 I I I I I : I I I : I I : I f | : | 
Db 1032 RHSCQCSNGASCDRVTGFCDCPSGFMGKNCESECPEGLWGSNCMKHCLCMHGGECNKENG 1141 

Qy 2 95 QCHCSPGYTGERCQDECPVGTYGVLCAETCQCVNGGKCYHVSGACLCEAGFAGERCEARL 354 

M I : I I I : I I I : I I I : I- I I I I : I I I I : : I I | I : 
Db H42 DCECIDGWTGPSCEFLCPFGQFGRNCAQRCNCKNGASCDRKTGRCECLPGWSGEHCE-KS 1200 

QY 355 CPEGLYGIKCDKRCPCHLENTHSCHPMSGECACKPGWSGLYCNETCSPGFYGEACQQICS 414 

I I M I I M I I : II I : I : I I I I I I | | : : | | | | 

Db 1201 CVSGHYGAKCEETCEC— ENGALCDPI SGHCSCQPGWRGKKCNRPCLKGYFGRHCSQSCR 1258 

Qy 415 CQNGADCDSVTGKCTCAPGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVCSPVDGSCTCK 474 

II I I : : 1 : I I I : I I : I I i I : I : I I : | | : : : | : I 1 I I 

Db 1259 CANSKSCDHISGRCQCPKGYAGHSCTELCPDGTFGESCSQKCDCGENSMCDAISGKCFCK 1318 



Qy 



4 75 AGWHGVDCSIRCPSGTWGFGCNLTCQCLNGGACNTLDGTCTCAPGWRGEKCELPCQDGTY 534 



' III I I : ||:: I : I I I I : | I | : | [ 

Db 13 19 PGHSGSDCKSGCVQGRFGPDCNQLCSCENGGVCDSSTGSCVCPPGYIGTKCEIACQSDRF 137E 

535 GLNCAERCDCSHADGCHPTTGHCRCLPGWSGVHCDSVCAEGRWGPNCSLPC 585 
I 1:1:1: I I I I I I I I I : : I : | : | | | | | : | | | 

Db 137 ^ GPTCEKICNCENGGTCDRLTGQCRCLPGFTGMTCNQVCPEGRFGAGCKEKC 1429 



RESULT 14 
Q63404 

ID Q63404 PRELIMINARY; PRT; 220 AA. 

AC. Q63404; 

DT 01-NOV-1996 (TrEMBLrel. 01, Created) 

DT 01-NOV-1996 (TrEMBLrel . 01, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel . 25, Last annotation update) 

DE (Clone REM4 ) ORF (Fragment). 

OS Rattus norvegicus (Rat) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Rattus 

OX NCBI__TaxID=10116; 

RN [1] 

RP SEQUENCE FROM N . A. 

RC STRAIN~Hol zrnan ; TI SSUE=Brain ; 

RX MEDLINE=96235155; PubMed=8 642 05 9 ; 

RA Asakura K., Pogulis R.J., Pease L.R., Rodriguez M. ; 

RT "A monoclonal autoantibody which promotes central nervous system 

RT remyelination is highly polyreactive to multiple known and novel 

RT antigens."; 

RL J. Neuroimmunol . 65:11-19(19 96). 

DR EMBL; L41686; AAB05844.1; 

DR HSSP; P01132; 1EGF. 

DR GO; GO: 0005198; F: structural molecule activity; IEA. 

DR InterPro; IPR006209; EGF_like. 

DR InterPro; IPR009030; Grow_f ac_recep . 

DR InterPro; IPR002049; LamininJEGF. 

DR Pfam; PF00008; EGF; 3. 

DR PRINTS; PR00011; EGFLAMININ . 

DR SMART; SM0018 0; EGF^Lam; 2. 

DR PROSITE; PS00022; EGF_JL; 5. 

DR PROSITE; PS01186; EGF_2 ; 5. 

KW EGF-like domain; Laminin EGF-like domain. 

FT NON_TER 1 1 

FT NON_TER 22 0 22 0 

SQ SEQUENCE 220 AA; 23231 MW; 3119D3 9 1EAF64 372 CRC64 ; 

Query Match 36.4%; Score 1309; DB 11; Length 220; 

Best Local Similarity 95.4%; Pred. No. 2.8e-117; 

Matches 209; Conservative 4; Mismatches 6; Indels 0; Gaps 0; 

Qy 200 GATCDHVTGECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPCQNGGVCHHVTGECSCPSGW 259 

I I I I I I : I I I I I I | M | | | M I I I I II I I I I I I II I I I I | | | | | | M | | | | | | | | | | | | 
Db 2 GATCDHITGECRCSPGYTGAFCEDLCPPGKHGPQCEQRCPCQNGGVCHHVTGECSCPSGW 61 

QY 260 MGTVCGQPCPEGRFGKNCSQECQCHNGGTCDAATGQCHCSPGYTGERCQDECPVGTYGVL 319 

M M I I I I II I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I | I I I I I II 

Db 62 MGTVCGQPCPEGRFGKNCSQECQCHNGGACDAATGQCHCSPGYTGERCQDECPVGTYGVR 121 



Qy 320 CAETCQCVNGGKCYHVSGACLCEAGFAGERCEARLCPEGLYGIKCDKRCPCHLENTHSCH 379 

I I I I I : I I I I I I I I I I I I I II I I I I : II I I I I I I I I I I I I II I I I I I I I I I : I I | I I I 
Db 122 CAETCRCVNGGKCYHVSGTCLCEAGFSGEFCEARLCPEGLYGIKCDKRCPCHLDNTHSCH 181 

Qy 380 PMSGECACKPGWSGLYCNETCSPGFYGEACQQICSCQNG 418 

I I I I I I I I I I I I I I I I II I I II I I I II I I I I I I I I I I I 
Db 182 PMSGECGCKPGWSGLYCNETCSPGFYGEACQQICSCQNG 220 



RESULT 15 
Q9W0A0 

ID Q9W0AO PRELIMINARY; PRT; 881 AA. 

AC Q9W0A0; 

DT 01-MAY-2000 (TrEMBLrel. 13, Created) 

DT 01-JUN-2003 (TrEMBLrel . 24, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE CG2086-PB. 

GN DRPR OR CG2086. 

OS Drosophila melanogas ter (Fruit fly) . 

OC Eukaryota; Metazoa; Arthropoda; Hexapoda; Insecta; Pterygota; 

OC Neoptera; Endopterygota ; Diptera; Brachycera; Muscomorpha; 

OC Ephydroidea; Drosophilidae ; Drosophila. 

OX NCBI_TaxID=7227; 
RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE=2 019 60 06; PubMed=l 0731132; 

RA Adams M.D., Celniker S.E., Holt R.A. , Evans C.A. , Gocayne J.D., 

RA Amanatides P.G., Scherer S.E., Li P.W., Hoskins R.A. , Galle R.F., 

RA George R.A. , Lewis S.E., Richards S., Ashburner M. , Henderson S.N., 

RA Sutton G.G., Wortman J.R., Yandell M.D., Zhang Q. , Chen L.X., 

RA Brandon R.C., Rogers Y.H., Blazej R.G., Champe M. , Pfeiffer B.D., 

RA Wan K.H., Doyle C, Baxter E.G., Helt G. , Nelson C.R., Gabor G.L., 

RA Abril J.F., Agbayani A., An H.J. , Andrews-Pf annkoch C, Baldwin D., 

RA Ballew R.M., Basu A., Baxendale J., Bayraktaroglu L. , Beasley E.M., 

RA Beeson K.Y., Benos P.V., Berman B.P., Bhandari D. , Bolshakov S., 

RA Borkova D., Botchan M.R., Bouck J., Brokstein P., Brottier P., 

PA Burtis K.C., Busam D.A. , Butler H., Cadieu E . , Center A., Chandra I., 

RA Cherry J.M., Cawley S., Dahlke C, Davenport L.B., Davies P., 

RA de Pablos B., Delcher A., Deng Z., Mays A.D., Dew I., Dietz S.M., 

RA Dodson K. , Doup L.E., Downes M. , Dugan-Rocha S., Dunkov B.C., Dunn P., 

RA Durbin K.J., Evangelista C.C., Ferraz C, Ferriera S., Fleischmann W., 

RA Fosler C, Gabrielian A.E., Garg N.S., Gelbart W.M., Glasser K., 

RA Glodek A., Gong F. , Gorrell J.H., Gu Z . , Guan P., Harris M. , 

RA Harris N.L., Harvey D . , Heirnan T.J., Hernandez J.R., Houck J., 

RA Hostin D., Houston K . A. , Howland T.J., Wei M.H., Ibegwam C, 

RA Jalali M. , Kalush F . , Karpen G.H., Ke Z., Kennison J. A., Ketchum K.A., 

RA Kimmel B.E., Kodira CD., Kraft C, Kravitz S., Kulp D., Lai Z., 

RA Lasko P., Lei Y. , Levitsky A. A., Li J., Li Z., Liang Y. , Lin X., 

RA Liu X., Mattei B., Mcintosh T.C., McLeod M.P., McPherson D., 

RA Merkulov G. , Milshina N.V., Mobarry C, Morris J., Moshrefi A., 

RA Mount S.M., Moy M. , Murphy B., Murphy L., Muzny D.M., Nelson D.L., 

RA Nelson D.R., Nelson K . A. , Nixon K., Nusskern D.R., Pacleb J.M., 

RA Palazzolo M. , Pittman G.S., Pan S., Pollard J., Puri V., Reese M.G., 

RA Reinert K., Remington K., Saunders R.D., Scheeler F. , Shen H., 

RA Shue B.C., Siden-Kiamos I., Simpson M. , Skupski M.P., Smith T., 

RA Spier E., Spradling A.C., Stapleton M. , Strong R., Sun E., 

RA Svirskas R. , Tector C, Turner R., Venter E., Wang A.H., Wang X., 



RA Wang Z.Y., Wassarman D.A. , Weinstock G.M., Weissenbach J., 

RA Williams S.M., WoodageT, Worley K.C, Wu D., Yang S., Yao Q.A., Ye J . , 

RA Yeh R.F., Zaveri J.S., Zhan M. , Zhang G. , Zhao Q. , Zheng L., 

RA Zheng X.H., Zhong F.N., Zhong W. , Zhou X., Zhu S., Zhu X., Smith H.O., 

RA Gibbs R.A., Myers E.W., Rubin G.M. , Venter J.C; 

RT "The genome sequence of Drosophila melanogas ter . " ; 

RL Science 2 87:2185-2195(2000). 

RN [2] 

RP SEQUENCE FROM N . A. 

RA Misra S., Crosby M.A. , Matthews B.B., Bayraktaroglu L . , Campbell K., 

RA Hradecky P., Huang Y. , Kaminker J. S . , Prochnik S.E., Smith CD., 

RA Tupy J.L., Bergman CM., Berman B.P., Carlson J.W. , Celniker S.E., 

RA Clamp M.E., Drysdale R.A. , Emmert D., Frise E . , de Grey A. D.N. J., 

RA Harris N.L., Kronmiller B., Marshall B., Millburn G.H., Richter J., 

RA Russo S., Searle S.M.J., Smith E., Shu S., Smutniak F. , 

RA Whitfield E.J., Ashburner M. , Gelbart W.M. , Rubin G.M. , Mungall C.J., 

RA Lewis S . E . ; 

RT "Annotation of Drosophila melanogaster genome."; 

RL Submitted (MAR-2000) to the EMBL/ GenBank/DDBJ databases. 

RN [3] 

RP SEQUENCE FROM N . A. 

RA FlyBase; 

RL Submitted (SEP-2002) to the EMBL/ GenBank/DDBJ databases. 

RN [4] 

RP SEQUENCE FROM N . A. 

RA FlyBase; 

RL Submitted (JAN-2003) to the EMBL/ GenBank/DDBJ databases. 

DR EMBL; AE003472; AAF47553.2; 

DR GO; GO: 0005198; F: structural molecule activity; IEA. 

DR InterPro; IPR006209; EGF_like. 

DR InterPro; IPR006210; IEGF. 

DR InterPro; IPR003006; Ig_MHC 

DR InterPro; IPR002049; Laminin_EGF. 

DR Pfam; PF00008; EGF; 7. 

DR PRINTS; PR00011; EGFLAMININ . 

DR SMART; SM00181; EGF; 12. 

DR SMART; SM00180; EGF_Lam; 11. 

DR PROSITE; PS00022; EGF_1 ; 11. 

DR PROSITE; PS01186; EGF_2 ; 13. 

DR PROSITE; PS00290; IG_MHC; 1. 

SQ SEQUENCE 881 AA; 96380 MW; 52 196D1 64F52F5C1 CRC64; 

Query Match 35.8%; Score 1290.5; DB 5; Length 881; 

Best Local Similarity 35.5%; Pred. No. 6.6e-115; 

Matches 210; Conservative 56; Mismatches 169; Indels 157; Gaps 3 

Qy 151 RCQCKNGALCNPITGACHCAAGFRGWRCEDRCEQGTYGNDCHQRCQCQNGATCDHVTGEC 210 

: I I I I : I 1 : I I I I I : I I I I I : I : I : I : : I : I : I I I I I : I I I 
Db 2 QCDCLNNAVCEPFSGDCECAKGYTGARCADICPEGFFGANCSEKCRCENGGKCHHVSGEC 61 

Qy 211 RCPPGYTGAFCEDLCPPGKHGPQCEQRCPCQNGGVCHHVTGECSCPSGWMGTVCGQPCPE 270 

: I I I : I I I : I I I I I I I I : I I I I I I I I I I I I I I I I I II 
Db 62 QCAPGFTGPLCDMRCPDGKHGAQCQQDCPCQNDGKCQPETGACMCNPGWTGDVCANKCPV 121 

Qy 271 GRFGKNCSQECQCHNGGTCDAATGQCHCSPGYTGERCQDECPVGTYGVLCAETCQCVNGG 330 

I : I I : I : I : I I I I I I 1 I M I I I I I I I : I I I I : I I I I 
Db 122 GSYGPGCQESCECYKGAPCHHITGQCECPPGYRGERCFDECQLNTYGFNCSMTCDCANDA 181 



Qy 331 KCYHVSGACLCEAGFAGERCEARLCPEGLYGIKCDKRCPCHLENTHSCH 379 

I : I I : I I : I : I I : I I I : I : : I I : I : I II 

Db 182 MCDRAI^IGTCICNPGWTGAKCAERICEANKYGLDCNRTCECDMEHTDLCHPETGNCQCSIG 241 

Qy 380 PMSGECACKPGWSGLYCNETCSPGFYGE 4 07 

I :: I I I I I I I | I : | | | : | : 
Db 242 WSSAQCTRPCTFLRYGPNCELTCNCKNGAKCSPVNGTCLCAPGWRGPTCEESCEPGTFGQ 301 

Qy 4 08 ACQQICSCQNGADCDSVTGKCTCAPGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVCSPV 4 67 

I I I I I I I I : 11:11 I : I I I Ml : I : I : I I I : I I : I 
Db 302 DCALRCDCQNGAKCEPETGQCLCTAGWKNIKCDRPCDLNHFGQDCAKVCDCHNNAACNPQ 361 

Qy 4 68 DGSCTCKAGW 477 

MINI III 

Db 3 62 NGSCTCAAGWTGERCERKCDTGKFGHDCAQKCQCDFNNSLACDATNGRCVCKQDWGVCRC 421 

Qy 478 HGVDCSIR 485 

: I : : I I 

Db 422 LNNSSCDPDSGNCICSAGWTGADCAEPCPPGFYGMECKERCPEILHGNKSCDHITGEILC 481 

Qy 486 CPSGTWGFGCNLTCQCLNGGACNTLDGTCTCAPGWRGEKCELPCQDGT 533 

I I : I : I I I 1 I I : 1 I I I : I I I I I I I I I I 
Db 4 82 RTGYIGLTCEHPCPAGLYGPGCKLKCNCEHGGECNHVTGQCQCLPGWTGSNCNESCPTDT 541 

Qy 534 YGLNCAERCDCSHADGCHPTTGHCRCLPGWSGVHCDSVCAEGRWGPNCSLPC 585 

II 11:1111 I III I I I I | | | | | | : | : | | 

Db 542 YGQGCAQRCRCVHHKVCRKADGMCICETGWSGTRCDEVCPEGFYGEHCMNTC 593 

Search completed: March 26, 2004, 16:11:12 
Job time : 30.5191 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM protein - protein search, using sw model 



Run on: 



Title: 

Perfect score: 
Sequence : 

Scoring table: 



March 26, 2004, 15:58:50 ; Search time 9.84589 Seconds 

(without alignments) 
3099.072 Million cell updates/sec 

US-10-092-390-4 
3601 

1 MVISLNSCLSFICLLLCHWI HCDSVCAEGRWGPNCSLPCY 586 

BLOSUM62 

Gapop 10.0 , Gapext 0.5 



Searched: 141681 seqs, 52070155 residues 

Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 



141681 



Database : 



SwissProt 42:* 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUMMARIES 



Lit 

fo. 


Score 


Query 
Match 


Length 


DB 


ID 




Description 


1 


843.5 


23, 


.4 


830 


1 


SREC 


_HUMAN 


Q14162 


homo sapien 


2 


813 


22 . 


. 6 


833 


1 


SRC2~ 


J40USE 


P59222 


mus musculu 


3 


808 


22. 


.4 


870 


1 


SRC2~ 


_HUMAN 


Q96gp6 


homo sapien 


4 


719 


20. 


. 0 


2524 


1 


NOTC" 


_XENLA 


P21783 


xenopus lae 


5 


717 


19. 


. 9 


2437 


1 


NTCl" 


^BRARE 


P46530 


brachydanio 


6 


697 


19 . 


, 4 


2321 


1 


NTC3~ 


_HUMAN 


Q9um47 


homo sapien 


7 


693 


19. 


.2 


2531 


1 


NTCl - 


MOUSE 


Q01705 


mus musculu 


8 


685. 5 


19. 


. 0 


2318 


1 


NTC3~ 


J40USE 


Q61982 


mus musculu 


9 


685. 5 


19. 


, 0 


2471 


1 


NTC2~ 


_RAT 


Q9qw30 


rattus norv 


10 


682.5 


19, 


, 0 


2319 


1 


NTC3~ 


_RAT 


Q9rl72 


rattus norv 


11 


682 


18 , 


, 9 


4289 


1 


TENX~ 


_HUMAN 


P22105 


homo sapien 


12 


677 


18. 


, 8 


2470 


1 


NTC2~ 


_MOUSE 


035516 


mus musculu 


13 


677 


18 . 


, 8 


2703 


1 


NOTC" 


J) ROME 


P07207 


drosophila 


14 


675. 5 


18 . 


. 8 


2003 


1 


NTC4_ 


^HUMAN 


Q99466 


homo sapien 


15 


675 


18 . 


,7 


2531 


1 


NTCl" 


_RAT 


Q07008 


rattus norv 


16 


667 . 5 


18 . 


,5 


1213 


1 


JAG3~ 


_BRARE 


Q90y54 


brachydanio 


17 


666. 5 


18. 


,5 


2471 


1 


NTC2~ 


JriUMAN 


Q04721 


homo sapien 



18 


664.5 


18 


. 5 


1064 


1 


FBP1_STRPU 


P10079 


strongyloce 


19 


662 


18 


. 4 


2556 


1 


NTC1_HUMAN 


P46531 


homo sapien 


20 


658 


18, 


. 3 


1964 


1 


NTC4JMOUSE 


P31695 


mus musculu 


21 


646 


17, 


. 9 


1808 


1 


TENA_CHICK 


P10039 


gallus gall 


22 


644.5 


17, 


. 9 


2201 


1 


TENA HUMAN 


P24821 


homo sapien 


23 


638 


17 , 


. 7 


3695 


1 


LMA5_HUMAN 


015230 


homo sapien 


24 


633 


17, 


. 6 


1238 


1 


JAG2_ HUMAN 


Q9y219 


homo sapien 


25 


631.5 


17, 


.5 


2139 


1 


CRB_ DROME 


P10040 


drosophila 


26 


631 


17, 


. 5 


1247 


1 


JAG2_MOUSE 


Q9qye5 


mus musculu 


27 


629 


17, 


.5 


1218 


1 


JAGl HUMAN 


P78504 


homo sapien 


28 


619.5 


17 , 


,2 


1202 


1 


JAG 2 _ RAT 


P97607 


rattus norv 


29 


616 


17 . 


. 1 


1219 


1 


JAGl RAT 


Q63722 


rattus norv 


30 


615 


17. 


, 1 


1242 


1 


JAG l^B RARE 


Q90y57 


brachydanio 


31 


614 


17 . 


. 1 


1218 


1 


JAG1JMOUSE 


Q9qxx0 


mus musculu 


32 


611 


17 . 


. 0 


1746 


1 


TENA_PIG 


Q29116 


sus scrofa 


33 


593.5 


16. 


, 5 


3672 


1 


LML2_CAEEL 


Q21313 


caenorhabdi 


34 


587 


16. 


, 3 


1408 


1 


SERR__DROME 


P18168 


drosophila 


35 


586 


16. 


, 3 


3718 


1 


LMA5J40USE 


Q61001 


mus musculu 


36 


577.5 


16. 


, 0 


1801 


1 


LMB2_RAT 


P15800 


rattus norv 


37 


576.5 


16. 


, 0 


3712 


1 


LMA DROME 


Q00174 


drosophila 


38 


567 


15. 


,1 


1798 


1 


LMB2_HUMAN 


P55268 


homo sapien 


39 


564 .5 


15. 


1 


1799 


1 


LMB2_MOUSE 


Q61292 


mus musculu 


40 


561 


15. 


6 


3106 


1 


LMA2 MOUSE 


Q60675 


mus musculu 


41 


560.5 


15. 


, 6 


1429 


1 


LI12_CAEEL 


P14585 


caenorhabdi 


42 


559 


15. 


5 


3110 


1 


LMA2 HUMAN 


P24043 


homo sapien 


43 


556 


15. 


4 


833 


1 


DL DROME 


P10041 


drosophila 


44 


546.5 


15. 


2 


3084 


1 


LMAl_MOUSE 


P19137 


mus musculu 


45 


536 


14 . 


9 


1295 


1 


GLP1 CAEEL 


P13508 


caenorhabdi 



ALIGNMENTS 



RESULT 1 
SREC_HUMAN 

ID SREC_HUMAN STANDARD; PRT; 830 AA. 

AC Q14162; 043701; 

DT 28-FEB-2003 (Rel. 41, Created) 

DT 28-FEB-2003 (Rel. 41, Last sequence update) 

DT 10-OCT-2003 (Rel. 42, Last annotation update) 

DE Endothelial cells scavenger receptor precursor (Acetyl LDL receptor) 

DE (Scavenger receptor class F member 1) . 

GN SCARFl OR SREC OR KIAA014 9. 

OS Homo sapiens (Human) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 

OC Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

OX NCBI_TaxID-9 60 6; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE=Umbilical vein endothelial cells; 

RX MEDLINE=98 05 8 8 97; PubMed-9395444 ; 

RA Adachi H., Tsujimoto M. , Arai H., Inoue K. ; 

RT "Expression cloning of a novel scavenger receptor from human 

RT endothelial cells."; 

RL J. Biol. Chem. 272:31217-3122 0(1997). 

RN [2] 

RP SEQUENCE FROM N.A. 



RX MEDLINE=22086180; PubMed=l 1 97 8 7 92 ; 

RA Adachi H., Tsujimoto M. / 

RT "Characterization of the human gene encoding the scavenger receptor 

RT expressed by endothelial cell and its regulation by a novel 

RT transcription factor, endothelial zinc finger p.rotein-2 . "; 

RL J. Biol. Chern. 277:24 014-24 021(2 002). 

RN [3] 

RP SEQUENCE FROM N.A. 

RC TISSUE=Bone marrow; 

RX MEDLINE=9 6127 5 30; PubMed=8 5 9 02 8 0 ; 

RA Nagase T . , Seki N., Tanaka A. , Ishikawa K.-I., Nomura N.; 

RT "Prediction of the coding sequences of unidentified human genes. IV. 

RT The coding sequences of 40 new genes ( KIAA0 12 1-KIAA0160 ) deduced by 

RT analysis of cDNA clones from human cell line KG-1." ; 

RL DNA Res. 2:167-174(1995). 

RN [4] 

RP SEQUENCE FROM N.A. 

RC TISSUE=Testis; 

RX MEDLINE-223 8 82 57; PubMed-124 7 7 932 ; 

RA Strausberg R.L., Feingold E.A., Grouse L.H., Derge J.G., 

RA Klausner R.D., Collins F.S., Wagner L. , Shenmen CM., Schuler G.D., 

RA Altschul S.F., Zeeberg B., Buetow K.H., Schaefer C.F., Bhat N.K., 

RA Hopkins R.F., Jordan H. , Moore T., Max S.I., Wang J. , Hsieh F. , 

RA Diatchenko L., Marusina K. , Farmer A. A. , Rubin G.M., Hong L., 

RA Stapleton M. , Soares M.B., Bonaldo M.F., Casavant T.L., Scheetz T.E., 

RA Brownstein M.J., Usdin T.B., Toshiyuki S., Carninci P., Prange C, 

RA Raha S.S., Loquellano N.A. , Peters G.J., Abramson R.D., Mullahy S.J., 

RA Bosak S.A., McEwan P.J., McKernan K. J. , Malek J.A. , Gunaratne P.H., 

RA Richards S., Worley K.C., Hale S., Garcia A.M., Gay L.J., Hulyk S.W., 

RA Villalon D.K., Muzny D.M., Sodergren E.J., Lu X., Gibbs R.A., 

RA Fahey J., Helton E., Ketteman M. , Madan A., Rodrigues S., Sanchez A. , 

RA Whiting M. , Madan A., Young A.C., Shevchenko Y. , Bouffard G.G., 

RA Blakesley R.W., Touchman J.W. , Green E.D., Dickson M.C., 

RA Rodriguez A.C., Grimwood J., Schmutz J., Myers R.M. , 

RA Butterfield Y.S.N. , Krzywinski M.I., Skalska U., Smailus D.E., 

RA Schnerch A., Schein J.E., Jones S.J.M., Marra M.A. ; 

RT "Generation and initial analysis of more than 15,000 full-length 

RT human and mouse cDNA sequences."; 

RL Proc. Natl. Acad. Sci. U.S.A. 9 9:16899-16903(2002). 

CC -!- FUNCTION: Mediates the binding and degradation of acetylated low 
CC density lipoprotein (Ac-LDL) . Mediates heterophilic interactions, 

CC suggesting a function as adhesion protein (By similarity) . 

CC SUBUNIT: Heterophilic interaction with SREC2 via its extracellular 

CC domain. The heterophilic interaction is suppressed by the presence 

CC of ligand such as Ac-LDL (By similarity) . 

CC SUBCELLULAR LOCATION: Type I membrane protein (Potential). 

CC -!- TISSUE SPECIFICITY: Endothelial cells. 

CC -!- SIMILARITY: Contains 6 EGF-like domains. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb~sib.ch). 

CC 
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DR 


EMBL; D63483; BAA09770.1; 




DR 


EMBL; BC039735; 


AAH39735.1; - 
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FT CARBOHYD 393 393 N-LINKED ( GLCNAC . . .) ( POTENTIAL) . 

FT CONFLICT 662 - 662 R -> W (IN REF. 3) . 

SQ SEQUENCE 830 AA; 87430 MW; F5 60D9E1AA64D7 7 9 CRC64; 

Query Match 23.4%; Score 843.5; DB 1; Length 830; 

Best Local Similarity 36.2%; Pred. No. 1.4e-47; 

Matches 158; Conservative 51; Mismatches 164; Indels 63; Gaps 15 

Qy 93 QCCPGFYESGEMC-VPHC — ADKCVHGR-CIAPNTCQCEPGWGGTNCSSACDGDHWGPHC 14 8 

MM:: : | : I I II I : I I : I : I I : I : I I I I i : I I I I 

Db 40 QCCAGWRQKDQECTIPICEGPDACQKDEVCVKPGLCRCKPGFFGAHCSSRCPGQYWGPDC 99 

Qy 14 9 TSRCQCKNGALCNPITGACHCAAGFRGWRCEDRCEQGTYGNDCHQRCQCQNGATCDHVTG 208 

II I I I I I I I I I I I I I I : I I I I I 

Db 10 0 RESCPCHPHGQCEPATGACQCQADRWGARCEFPCACGPHGR CDPATG 14 6 

Qy 209 ECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPCQNGGVCHHVTGECSCPSGWMGTVCGQPC 268 

I I I I :: I I : I I I I I I I II 
Db 14 7 VCHCEPGWWSSTCRRPCQCNTAAARCEQ ATGACVCKPGW 18 5 

Qy 269 PEGRFGKNCSQECQCHNGGTCDAATGQCHCSPGYTGERCQDECPVGTYGVLCAETCQCVN 328 

: I : I I I I I I I : : I : I I I I : I I I : I : M 

Db 18 6 WGRRCSFRCNCH-GSPCEQDSGRCACRPGWWGPECQQQ CECVR 227 

Qy 32 9 GGKCYHVSGACLCEAGFAGERCEARLCPEGLYGIKCDKRCPCHLENTHSCHPMSGEC-AC 3 87 

I : I I I I I II I I I I I I I : I : : I I : : I I : I I : I 

Db 22 8 -GRCSAASGECTCPPGFRGARCELP-CPAGSHGVQCAHSCG-RCKHNEPCSPDTGSCESC 284 

Qy 38 8 KPGWSGLYCNETCSPGFYGEACQQIC-SCQNGADCDSVTGKC-TCAPGFKGIDCSTPCPL 445 

: I I I : I I : I I I : I I : I : 1 I I : : I I : III I I I : I I ill 

Db 285 EPGWNGTQCQQPCLPGTFGESCEQQCPHCRHGEACEPDTGHCQRCDPGWLGPRCEDPCPT 34 4 

Qy 44 6 GTYGINCSSRCGCKNDAVCSPVDGSCTCKAGWHGVDCSIRCPSGTWGFGCNLTCQCLNGG 505 

I I : I : I I I I I I I I I I : I I : I I : I I I : : I : I I 

Db 34 5 GTFGEDCGSTCPTCVQGSCDTVTGDCVCSAGYWGPSCNASCPAGFHGNNCSVPCECPE-G 4 03 

Qy 50 6 ACNT L DGT CT CAP GWR 521 

I : : I : I II 
Db 404 LCHPVSGSCQPGSGSR 419 



RESULT 2 
SRC2_MOUSE 

ID SRC2_MOUSE STANDARD; PRT; 833 AA. 

AC P59222; 

DT 28-FEB-2003 (Rel. 41, Created) 

DT 28-FEB-2003 (Rel. 41, Last sequence update) 

DT 10-OCT-2003 (Rel. 42, Last annotation update) 

DE Scavenger receptor class F member 2 precursor (Scavenger receptor 

DE expressed by endothelial cells 2 protein) (SREC-II). 

GN SCARF2 OR SREC2 . 

OS Mus musculus (Mouse) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 
OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
OX NCBI_TaxID=10090; 
RN [1] 

RP SEQUENCE FROM N.A., AND CHARACTERIZATION. 



RC STRAIN=C5 7BL/6J; 

RX MEDLINE=2 22 67235; PubMed=12 154 095 ; 

RA Ishii J., Adachi H. , Aoki J., Koizumi H., Tomita S., Suzuki T., 

RA Tsujimoto M . f Inoue K. , Aral H.; 

RT "SREC-II, a new member of the scavenger receptor type F family, 

RT trans-interacts with SREC-I through its extracellular domain."; 

RL J. Biol. Chem. 2 77:39696-397 02(2 002). 

CC -!- FUNCTION: Probable adhesion protein, which mediates homophilic and 
CC heterophilic interactions. In contrast to SCARF1, it poorly 

CC mediates the binding and degradation of acetylated low density 

CC lipoprotein (Ac-LDL) . 

CC -!- SUBUNIT: Homophilic and heterophilic interaction via its 
CC extracellular domain. Interacts with SCARF1 . The heterophilic 

CC interaction with SCARF1, which is stronger than the homophilic 

CC interaction with itself, is suppressed by the presence of SCARF1 

CC ligand such as Ac-LDL. 

CC -!- SUBCELLULAR LOCATION: Type I membrane protein (Potential). 

CC -!- SIMILARITY : Contains 7 EGF-like domains. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 



CC 












DR 


EMBL; AF522197; 


AAN45862 . 


1; 




DR 


MGD; MGI: 


1858430; Scarf2. 






DR 


GO; GO:0005044; 


F : s cavenger 


receptor activity; IDA. 


DR 


GO; GO:0007157; 


P : heterophilic cell adhesion; IDA. 


DR 


InterPro; 


IPR006209; EGF_ 


like. 


DR 


InterPro; 


IPR006210; IEGF. 




DR 


InterPro; 


IPR002049; Laminin 


_EGF. 


DR 


PRINTS; PR00011; 


EGFLAMININ. 




DR 


SMART; SM00181; 


EGF; 8. 






DR 


SMART; SM0018 0; 


EGF Lam; 


6, 




DR 


PROSITE; 


PS00022; EGF_1; 


7. 




DR 


PROSITE; 


PS01186; EGF_2; 


4. 




DR 


PROSITE; 


PS50026; EGF 3; 


3. 




KW 


Cell adhesion; Receptor; 


Repeat; Signal; Transmembrane; 


KW 


EGF-like 


domain; 


Glycoprotein . 


FT 


SIGNAL 


1 


33 




POTENTIAL. 


FT 


CHAIN 


34 


833 




SCAVENGER RECEPTOR CLASS F MEMBER 2 


FT 


DOMAIN 


34 


433 




EXTRACELLULAR (POTENTIAL) . 


FT 


TRANSMEM 


434 


454 




POTENTIAL. 


FT 


DOMAIN 


455 


791 




CYTOPLASMIC ( POTENTIAL) . 


FT 


DOMAIN 


68 


102 




EGF-LIKE 1. 


FT 


DOMAIN 


114 


145 




EGF-LIKE 2 . 


FT 


DOMAIN 


146 


174 




EGF-LIKE 3. 


FT 


DOMAIN 


175 


204 




EGF-LIKE 4. 


FT 


DOMAIN 


205 


233 




EGF-LIKE 5. 


FT 


DOMAIN 


234 


262 




EGF-LIKE 6. 


FT 


DOMAIN 


364 


395 




EGF-LIKE 7. 


FT 


DOMAIN 


639 


714 




PRO-RICH. 


FT 


DISULFID 


72 


84 




POTENTIAL. 


FT 


DISULFID 


78 


90 




POTENTIAL. 



FT 


DISULFID 


92 


101 


POTENTIAL . 




FT 


DISULFID 


118 


126 


POTENTIAL. 




FT 


DISULFID 


120 


133 


POTENTIAL. 




FT 


DISULFID 


135 


144 


POTENTIAL. 




FT 


DISULFID 


148 


155 


POTENTIAL. 




FT 


DISULFID 


150 


162 


POTENTIAL. 




FT 


DISULFID 


164 


173 


POTENTIAL. 




FT 


DISULFID 


177 


185 


POTENTIAL. 




FT 


DISULFID 


179 


192 


POTENTIAL. 




FT 


DISULFID 


194 


203 


POTENTIAL. 




FT 


DISULFID 


207 


214 


POTENTIAL. 




FT 


DISULFID 


209 


221 


POTENTIAL. 




FT 


DISULFID 


223 


232 


POTENTIAL. 




FT 


DISULFID 


236 


243 


POTENTIAL. 




FT 


DISULFID 


238 


250 


POTENTIAL. 




FT 


DISULFID 


252 


261 


POTENTIAL. 




FT 


DISULFID 


368 


376 


POTENTIAL. 




FT 


DISULFID 


371 


383 


POTENTIAL. 




FT 


DI SULFID 


o o r; 


6 y 4 


T)/"\rp "CMm T 7\ T 

POl EN I IAL . 




FT 


CARBOHYD 


75 


75 


N-LINKED ( GLCNAC . . 


. ) (POTENTIAL 


FT 


CARBOHYD 


302 


302 


N-LINKED (GLCNAC. . 


.) (POTENTIAL 


FT 


CARBOHYD 


357 


357 


N-LINKED (GLCNAC. . 


.) (POTENTIAL 


FT 


CARBOHYD 


395 


395 


N-LINKED (GLCNAC. . 


.) (POTENTIAL 


SQ 


SEQUENCE 


833 AA; 


87871 


MW ; 51 EADEEAACAFF 0 0 5 


CRC64; 


Query Match 




22. 6% 


Score 813; DB 1; 


Length 833; 


Best Local Similarity 


32.6% 


Pred. No. 1.3e-45; 




Matches 154; 


Conservative 


46; Mismatches 161; 


Indels 112; 



17; 



Db 



9 4 CCPGFYESGEMC-VPHCA — DKCVHGR-CIAPNTCQCEPGWGGTNCS SACDGDHWGPHCT 14 9 

i I I : : I : I : I I I : I I : I I : I I I : I Mil 

5 6 CCAGWRQLGDECGIAVCEGNSTCSENEVCVRPGECRCRHGYFGANCDTKCPRQFWGPDCK 115 



QY 
Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



15 0 SRCQCKNGALCNPITGACHCAAGFRGWRCEDRCEQGTYGNDCHQRCQCQNGATCDHVTGE 2 09 

III I : I I I I I I I II 1111:111 : I 

116 ERCSCHPHGQCEDVTGQCTCHA — RRW GARCEHACQCQHG-TCHPRSGA 161 

210 CRCPPGYTGAFCEDLCPPGKHGPQCEQRCPCQNGGVCHHVTGECSCPSGWMGTVCGQPCP 2 6 9 

I I I I I : I I II II I I I I I I I 

162 CRCEPGWWGA QCAS AC YCS AT S RCD PQTGACLCHVGW 19 8 

27 0 EGRFGKNCSQECQCHNGGTCDAATGQCHCSPGYTGERCQDECPVGTYGVLCAETCQCVNG 32 9 

: I : : i : : I I I I : : I : I I 

199 WGRSCNNQCAC-NSSPCEQQSGRCQCR 224 

330 GKCYHVSGACLCEAGFAGERCEARLCPEGLYGIKCDKRCPCHLENTHS-CHPMSGECACK 38 8 

I : : I : I I : I I : I I I i : I I I I 

225 ERMFGARCDRYCQC SHGRCHPVDGTCACD 253 

38 9 PGWSGLYCNETCSPGFYGEACQQIC-SCQNGADCDSVTGKC-TCAPGFKGIDCSTPCPLG 44 6 

I I : I I I I I MM I : : I I : I I Ml II I M I I II I 

254 PGYRGKYCREPCPAGFYGPGCRRRCGQCKGQQPCTWEGRCLTCEPGWNGTKCDQPCATG 313 

44 7 TYGINCSSRC-GCKNDAVCSPVDGSCT-CKAGWHGVDCSIRCPSGTWGFGCNLTCQCLNG 50 4 

II I II M : I : I I I I I II I I I M M M I I I 

314 FYGEGCGHRCPPCRDGHACNHVTGKCTHCNAGWIGDRCETKCSNGTYGEDCAFVCSDCGS 37 3 



Qy 505 GACNTLDGTCTCAPGWRGEKCELPCQDGTYGLNCAERCDCSHADGCHPTTGHC 557 

II: llhll I I : I I : I : : I I : I I I : I I I I I 
Db 374 GHCDFQSGRCLCSPGVHGPHCNVTCPAGLHGVDCAQACSC-HEESCDPVTGAC 425 



RESULT 3 
SRC2_HUMAN 

ID SRC2JHUMAN STANDARD; PRT; 870 AA. 

AC Q96GP6; Q9BW74; 

DT 28-FEB-2003 (Rel. 41, Created) 

DT 28-FEB-2003 (Rel. 41, Last sequence update) 

DT 10-OCT-2003 (Rel. 42, Last annotation update) 

DE Scavenger receptor class F member 2 precursor (Scavenger receptor 

DE expressed by endothelial cells 2 protein) (SREC-II) . 

GN SCARF2 OR SREC2 . 

OS Homo sapiens (Human) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 

OC Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

OX NCBI_TaxID=960 6; 

RN [1] 

RP SEQUENCE FROM N.A., AND TISSUE SPECIFICITY. 

RX MEDLINE-22267235; PubMed=12 1 54 0 95 ; 

RA Ishii J., Adachi H., Aoki J., Koizumi H., Tomita S., Suzuki T., 

RA Tsujimoto M . , Inoue K., Arai H.; 

RT If SREC~II, a new member of the scavenger receptor type F family, 

RT trans-interacts with SREC-I through its extracellular domain. "; 

RL J. Biol. Chem. 277:3969 6-39702(2 002). 

RN [2] 

RP SEQUENCE OF 272-870 FROM N.A., AND VARIANTS GLU-777 AND LEU-778. 

RC TISSUE=Brain; 

RX MEDLINE-2238 8257; PubMed-124 77 932 ; 

RA Strausberg R.L., Feingold E.A., Grouse L.H., Derge J.G., 

RA Klausner R.D., Collins F.S., Wagner L . , Shenraen CM., Schuler G.D., 

RA Altschul S.F., Zeeberg B., Buetow K.H., Schaefer C.F., Bhat N.K., 

RA Hopkins R.F., Jordan H., Moore T., Max S.I., Wang J., Hsieh F. , 

RA Diatchenko L., Marusina K. , Farmer A. A. , Rubin G.M., Hong L., 

RA Stapleton M . , Soares M.B., Bonaldo M.F., Casavant T.L., Scheetz T.E. 

RA Brownstein M.J., Usdin T.B., Toshiyuki S., Carninci P., Prange C, 

RA Raha S.S., Loquellano N.A., Peters G.J., Abramson R.D., Mullahy S.J. 

RA Bosak S.A., McEwan P.J., McKernan K.J., Malek J. A. , Gunaratne P.H., 

RA Richards S., Worley K.C., Hale S., Garcia A.M. , Gay L.J., Hulyk S.W. 

RA Villalon D.K., Muzny D.M., Sodergren E.J., Lu X., Gibbs R.A. , 

RA Fahey J., Helton E., Ketteman M. , Madan A., Rodrigues S., Sanchez A. 

RA Whiting M. , Madan A., Young A.C., Shevchenko Y. , Bouffard G.G., 

RA Blakesley R.W. , Touchman J.W., Green E.D., Dickson M.C., 

RA Rodriguez A.C., Grimwood J., Schmutz J., Myers R.M., 

RA Butterfield Y.S.N., Krzywinski M.I., Skalska U., Smailus D.E., 

RA Schnerch A., Schein J.E., Jones S.J.M., Marra M.A. ; 

RT "Generation and initial analysis of more than 15,000 full-length 

RT human and mouse cDNA sequences. "/ 

RL Proc. Natl. Acad. Sci . U.S.A. 9 9:16899-16903(2002). 

CC -!- FUNCTION: Probable adhesion protein, which mediates homophilic a 
CC heterophilic interactions. In contrast to SCARF1, it poorly 

CC mediates the binding and degradation of acetylated low density 

CC lipoprotein (Ac-LDL) (By similarity) . 

CC -!- SUBUNIT : Homophilic and heterophilic interaction via its 

CC extracellular domain. Interacts with SCARF1. The heterophilic 



CC interaction with SCARF 1 , which is stronger than the homophilic 

CC interaction with itself, is suppressed by the presence of SCARFl 

CC ligand such as Ac-LDL (By similarity) . 

CC -!- SUBCELLULAR LOCATION: Type I membrane protein (Potential). 

CC -!- TISSUE SPECIFICITY: Predominantly expressed in endothelial cells. 

CC Expressed in heart, placenta, lung, kidney, spleen, small 

CC intestine and ovary. 

CC -!- SIMILARITY: Contains 7 EGF-like domains. 

CC -!- CAUTION: Ref.2 sequences differ from that shown due to 
CC frameshifts in positions 750, 751 and 768. 

CC ■ 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch) . 















JJK 


EMBL; AF522196; 


AAN45861 . 


1; 






EMBL; BC000584; 


AAH00584 . 


l; 


ALT_FRAME . 


JJrs. 


EMBL; BC009326; 


7\AH09326. 


1; 


ALT_FRAME . 


no 
JJrs. 


Genew; HGNC:19 8 


69; SCARF2 






JJx\ 


InterPro ; 


IPR0 0 62 09; EGF_ 


like. 


DR 


InterPro; 


IPR006210; IEGF 






DR 


InterPro ; 


IPR002049; Laminin EGF. 


DR 


PRINTS; PR00011; EGFLAMININ . 




DR 


SMART; SM00181; 


EGF; 7. 






DR 


SMART; SM00180; 


EGF^Lam; 


6. 




DR 


PROSITE; 


PS00022; EGF 1; 


7. 




DR 


PROSITE; 


PS0118 


6; EGF 2; 


4 . 




DR 


PROSITE; 


PS50026; EGF 3; 


3. 




KW 


Cell adhesion; 


Receptor; 


Repeat; Signal; Transmembrane; 


KW 


EGF-like 


domain 


; Glycoprotein; Polymorphism. 


FT 


SIGNAL 


1 


43 




POTENTIAL. 


FT 


CHAIN 


44 


870 




SCAVENGER RECEPTOR CLASS F MEMBER 2 


FT 


DOMAIN 


44 


441 




EXTRACELLULAR (POTENTIAL) . 


FT 


TRANSMEM 


442 


462 




POTENTIAL. 


FT 


DOMAIN 


463 


830 




CYTOPLASMIC (POTENTIAL) . 


FT 


DOMAIN 


76 


110 




EGF-LIKE 1. 


FT 


DOMAIN 


122 


153 




EGF-LIKE 2 . 


FT 


DOMAIN 


154 


182 




EGF-LIKE 3. 


FT 


DOMAIN 


183 


212 




EGF-LIKE 4 . 


FT 


DOMAIN 


213 


241 




EGF-LIKE 5. 


FT 


DOMAIN 


242 


270 




EGF-LIKE 6. 


FT 


DOMAIN 


372 


403 




EGF-LIKE 7 . 


FT 


DOMAIN 


652 


851 




PRO-RICH. 


FT 


DISULFID 


80 


92 




POTENTIAL . 


FT 


DISULFID 


86 


98 




POTENTIAL. 


FT 


DISULFID 


100 


109 




POTENTIAL. 


FT 


DISULFID 


126 


134 




POTENTIAL. 


FT 


DISULFID 


128 


141 




POTENTIAL. 


FT 


DISULFID 


143 


152 




POTENTIAL. 


FT 


DISULFID 


156 


163 




POTENTIAL. 


FT 


DISULFID 


158 


170 




POTENTIAL. 


FT 


DISULFID 


172 


181 




POTENTIAL. 


FT 


DISULFID 


185 


193 




POTENTIAL. 



FT 


DISULFID 


187 


200 


POTENTIAL. 




FT 


DISULFID 


202 


211 


POTENTIAL. 




FT 


DISULFID 


215 


222 


POTENTIAL. 




FT 


DISULFID 


217 


229 


POTENTIAL. 




FT 


DISULFID 


231 


240 


POTENTIAL. 




FT 


DISULFID 


244 


251 


POTENTIAL. 




FT 


DISULFID 


246 


258 


POTENTIAL. 




FT 


DISULFID 


260 


269 


POTENTIAL. 




FT 


DISULFID 


376 


384 


POTENTIAL. 




FT 


DISULFID 


379 


391 


POTENTIAL. 




FT 


DISULFID 


393 


402 


POTENTIAL. 




FT 


CARBOHYD 


83 


83 


N-LINKED (GLCNAC. 


. . ) (POTENTIAL) . 


FT 


CARBOHYD 


310 


310 


N-LINKED (GLCNAC. 


. . ) (POTENTIAL) . 


FT 


CARBOHYD 


365 


365 


N-LINKED (GLCNAC. 


. . ) (POTENTIAL) . 


FT 


CARBOHYD 


403 


403 


N-LINKED (GLCNAC. 


. . ) (POTENTIAL) . 


FT 


VARIANT 


777 


777 


D -> E (in dbSNP: 


759611) . 


FT 








/ FTId=VAR 015148. 




FT 


VARIANT 


778 


778 


V -> L (in dbSNP: 


759612) . 


FT 








/ FT I d=VAR 015149. 




FT 


VARIANT 


819 


819 


A -> G (in dbSNP: 


874100) . 


FT 








/ FTId-VAR 015150. 




FT 


VARIANT 


837 


837 


A -> G (in dbSNP: 


874101) . 


FT 








/FTId=VAR_015151 . 




FT 


CONFLICT 


474 


478 


MISSING (IN REF. 


2) • 


FT 


CONFLICT 


626 


641 


AL YARVARRE ARP ARA 


-> GTRPTTTWITHSTAAS (IN 


FT 








REF. 2; AAH00584) 




SQ 


SEQUENCE 


870 AA; 


92479 


MW; DCB735A50E6E9D1F CRC64; 



Query Match 22.4%; Score 808; DB 1; Length 870; 

Best Local Similarity 33.1%; Pred. No. 2.8e-45; 

Matches 156; Conservative 42; Mismatches 164; Indels 110; Gaps 17; 



Qy 94 CCPGFYESGEMC-VPHCA-- DKCVHGR-CIAPNTCQCEPGWGGTNCSSACDGDHWGPHCT 14 9 

| I I : : i : I : I I I : I hi I : i I I : I I I I I 

Db 64 CCAGWRQQGDECGIAVCEGNSTCSENEVCVRPGECRCRHGYFGANCDTKCPRQFWGPDCK 123 

Qy 150 SRCQCKNGALCNPITGACHCAAGFRGWRCEDRCEQGTYGNDCHQRCQCQNGATCDHVTGE 2 09 

II |:IMIIII II lllhlll : I 

Db 124 ELCSCHPHGQCEDVTGQCTCHA — RRW GARCEHACQCQHG-TCHPRSGA 169 

Qy 210 CRCPPGYTGAFCEDLCPPGKHGPQCEQRCPCQNGGVCHHVTGECSCPSGWMGTVCGQPCP 269 

I I I II : I i II II I I I I I : I I 

Db 17 0 CRCEPGWWGA QCASACYCSATSRCDPQTGACLCHAGW 2 06 

Qy 270 EGRFGKNCSQECQCHNGGTCDAATGQCHCSPGYTGERCQDECPVGTYGVLCAETCQCVNG 329 

: 1 : : I : : I I I | : : I : I I II I : I I ill I 

Db 2 07 WGRSCNNQCAC-NSSPCEQQSGRCQCR ER TFGARCDRYCQCFRG 2 49 

Qy 330 GKCYHVSGACLCEAGFAGERCEARLCPEGLYGIKCDKRCPCHLENTHSCHPMSGECACKP 38 9 

III: I I I I : I 

Db 250 RCHPVDGTCACEP 2 62 

Qy 390 GWSGLYCNETCSPGFYGEACQQIC-SCQNGADCDSVTGKC-TCAPGFKGIDCSTPCPLGT 447 

I : I I I 1 i I I I I I : : I h I I : I I I I I : I I I I I 

Db 2 63 GYRGKYCREPCPAGFYGLGCRRRCGQCKGQQPCTVAEGRCLTCEPGWNGTKCDQPCATGF 322 



Qy 



44 8 YGINCSSRC-GCKNDAVCSPVDGSCT-CKAGWHGVDCSIRCPSGTWGFGCNLTCQCLNGG 505 



Db 323 YGEGCSHRCPPCRDGHACNHVTGKCTRCNAGWTGDRCETKCSNGTYGEDCAEVCADCGSG 3 82 

Qy 506 ACNTLDGTCTCAPGWRGEKCELPCQDGTYGLNCAERCDCSHADGCHPTTGHC 557 

I : 111:11 I I : I I : I : I I : I I I I I I I I I 

Db 383 HCDFQSGRCLCSPGVHGPHCNVTCPPGLHGADCAQACSC-HEDTCDPVTGAC 433 

RESULT 4 
NOTC XENLA 



ID NOTC_XENLA STANDARD; PRT; 2524 AA. 

AC P21783; 

DT 01-MAY-1991 (Rel. 18, Created) 

DT 01-OCT-1996 (Rel. 34, Last sequence update) 

DT 28-FEB-2003 (Rel. 41, Last annotation update) 

DE Neurogenic locus notch protein homolog precursor (XOTCH protein) . 

GN XOTCH. 

OS Xenopus laevis (African clawed frog) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 

OC Amphibia; Batrachia; Anura; Mesobat rachia ; Pipoidea; Pipidae; 

OC Xenopodinae; Xenopus. 

OX NCBI_TaxID=8355; 

RN [1] 

RP SEQUENCE FROM N . A. 

RX MEDLINE=903852 85; PubMed=2 4 02 63 9; 

RA Coffman C, Harris W., Kintner C. ; 

RT "Xotch, the Xenopus homolog of Drosophila notch."; 

RL Science 24 9:1438-1441(19 90). 

RN [2] 

RP REVISIONS TO 1759-1782. 

RA Kintner C. ; 

RL Submitted (JUN-1996) to the EMBL/ GenBank/DDB J databases. 

CC -!- SUBCELLULAR LOCATION: Type I membrane protein. 

CC -!- DEVELOPMENTAL STAGE: Expressed almost uniformly in early embryos. 

CC -!- SIMILARITY: Belongs to the NOTCH family. 

CC -!- SIMILARITY: Contains 36 EGF-like domains. 
CC SIMILARITY: Contains 3 Lin/Notch repeats. 

CC -!- SIMILARITY: Contains 6 ANK repeats. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch), 

CC 

DR EMBL; M33874; AAB02039.1; -. 

DR HSSP; P00740; 1EDM. 

DR InterPro; IPR002110; ANK. 

DR InterPro; IPR000152; Asx_hydroxyl_S . 

DR InterPro; IPR000742; EGF_2 . 

DR InterPro; IPR001881; EGF_Ca . 

DR InterPro; IPR001438; EGF_II. 

DR InterPro; IPR006209; EGF_like. 

DR InterPro; IPR002049; Laminin_EGF. 

DR InterPro; IPR008297; Notch. 
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DR 
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DR 


Pfam; PF00023; ank; 6. 










DR 


Pfarn; PF00008; EGF; 36. 










DR 


Pfam; PF00066; notch; 3. 










DR 


PIRSF; PIRSF002279; Notch; 1. 










DR 


PRINTS; 


PR00010; 


EGFBLOOD. 










DR 


PRINTS; 


PR00011; 


EGFLAMININ . 










DR 


PRINTS; 


PR01452; 


NOTCH. 










DR 


SMART; SM00248; ANK; 6. 










DR 


SMART; SM0017 9; EGF_CA; 24. 










DR 


SMART; SM00004; NL; 2. 










DR 


PROSITE; 


PS50297; 


ANK REP REGION; 1. 








DR 


PROSITE; 


PS50088; 


ANK_REPEAT; 


4. 








DR 


PROSITE; 


PS00010; 


ASX^HYDROXYL; 23. 








DR 


PROSITE; 


PS00022; 


EGF_1 ; 34. 










DR 


PROSITE; 


PS01186; 


EGF_2; 29. 










DR 


PROSITE; 


PS50026; 


EGF_3; 36. 










DR 


PROSITE; 


PS01187; 


EGF_CA; 21. 










KW 


Differentiation; 


Neurogenesis 


; Repeat; 


ANK repeat; EGF-like domain; 


KW 


Transmembrane; Signal; Glycoprotein. 








FT 


SIGNAL 


1 


19 


POTENTIAL. 






FT 


CHAIN 


20 


2524 


NEUROGENIC LOCUS NOTCH PROTEIN HOMOLOG. 


FT 


DOMAIN 


20 


1728 


EXTRACELLULAR (POTENTIAL) . 




FT 


TRANSMEM 


1729 


1750 


POTENTIAL. 






FT 


DOMAIN 


1751 


2524 


CYTOPLASMIC 


(POTENTIAL) . 




FT 


DOMAIN 


20 


57 


EGF-LIKE 


1 . 






FT 


DOMAIN 


58 


99 


EGF-LIKE 


2 . 






FT 


DOMAIN 


102 


140 


EGF-LIKE 


3 . 






FT 


DOMAIN 


141 


177 


EGF-LIKE 


4 . 






FT 


DOMAIN 


179 


215 


EGF-LIKE 


5, 


CALCIUM-BINDING 


(POTENTIAL) . 


FT 


DOMAIN 


217 


254 


EGF-LIKE 


6 . 






FT 


DOMAIN 


256 


292 


EGF-LIKE 


7 , 


CALCIUM-BINDING 


(POTENTIAL) . 


FT 


DOMAIN 


294 


332 


EGF-LIKE 


8, 


CALCIUM-BINDING 


(POTENTIAL) . 


FT 


DOMAIN 


334 


370 


EGF-LIKE 


9, 


CALCIUM-BINDING 


(POTENTIAL) . 


FT 


DOMAIN 


371 


409 


EGF-LIKE 


10 






FT 


DOMAIN 


411 


449 


EGF-LIKE 


11 


, CALCIUM-BINDING 


(POTENTIAL) 


FT 


DOMAIN 


451 


487 


EGF-LIKE 


12 


, CALCIUM-BINDING 


(POTENTIAL) 


FT 


DOMAIN 


489 


525 


EGF-LIKE 


13 


, CALCIUM-BINDING 


(POTENTIAL) 


FT 


DOMAIN 


527 


563 


EGF-LIKE 


14 


, CALCIUM-BINDING 


(POTENTIAL) 


FT 


DOMAIN 


565 


600 


EGF-LIKE 


15 


, CALCIUM-BINDING 


(POTENTIAL) 


FT 


DOMAIN 


602 


638 


EGF-LIKE 


16 


, CALCIUM-BINDING 


(POTENTIAL) 


FT 


DOMAIN 


640 


675 


EGF-LIKE 


17 






FT 


DOMAIN 


677 


713 


EGF-LIKE 


18 


, CALCIUM-BINDING 


(POTENTIAL) 


FT 


DOMAIN 


715 


750 


EGF-LIKE 


19 


, CALCIUM-BINDING 


(POTENTIAL) 


FT 


DOMAIN 


752 


788 


EGF-LIKE 


20 


, CALCIUM-BINDING 


(POTENTIAL) 


FT 


DOMAIN 


790 


826 


EGF-LIKE 


21 


, CALCIUM-BINDING 


(POTENTIAL) 


FT 


DOMAIN 


828 


866 


EGF-LIKE 


22 






FT 


DOMAIN 


868 


904 


EGF-LIKE 


23 


, CALCIUM-BINDING 


(POTENTIAL) 


FT 


DOMAIN 


906 


942 


EGF-LIKE 


24 


r CALCIUM-BINDING 


(POTENTIAL) 


FT 


DOMAIN 


944 


980 


EGF-LIKE 


25 


, CALCIUM-BINDING 


(POTENTIAL) 


FT 


DOMAIN 


982 


1018 


EGF-LIKE 


26 






FT 


DOMAIN 


1020 


1056 


EGF-LIKE 


27 


, CALCIUM-BINDING 


(POTENTIAL) 


FT 


DOMAIN 


1058 


1094 


EGF-LIKE 


28 






FT 


DOMAIN 


1096 


1142 


EGF-LIKE 


29 






FT 


DOMAIN 


1144 


1180 


EGF-LIKE 


30 


, CALCIUM-BINDING 


(POTENTIAL) 


FT 


DOMAIN 


1182 


1218 


EGF-LIKE 


31 


, CALCIUM-BINDING 


( POTENTIAL) 


FT 


DOMAIN 


1220 


1264 


EGF-LIKE 


32 


, CALCIUM-BINDING 


(POTENTIAL) 



1 



FT 


DOMAIN 


1266 


1304 


EGF-LIKE 33. 


FT 


DOMAIN 


1306 


1346 


EGF-LIKE 34. 


FT 


DOMAIN 


1347 


1383 


EGF-LIKE 35. 


FT 


DOMAIN 


1386 


1424 


EGF-LIKE 36. 


FT 


REPEAT 


1441 


1478 


LIN/NOTCH 1. 


FT 


REPEAT 


1479 


1520 


LIN/NOTCH 2. 


FT 


REPEAT 


1521 


1560 


LIN/NOTCH 3. 


FT 


REPEAT 


1876 


1919 


ANK 1. 


FT 


REPEAT 


1924 


1953 


ANK 2. 


FT 


REPEAT 


1957 


1987 


ANK 3. 


FT 


REPEAT 


1991 


2020 


ANK 4. 


FT 


REPEAT 


2024 


2053 


ANK 5. 


FT 


REPEAT 


2057 


2086 


ANK 6. 


FT 


DISULFID 


22 


35 


BY 


SIMILARITY. 


FT 


DISULFID 


29 


45 


BY 


SIMILARITY. 


FT 


DISULFID 


47 


56 


BY 


SIMILARITY. 


FT 


DISULFID 


62 


74 


BY 


SIMILARITY. 


FT 


DISULFID 


68 


87 


BY 


SIMILARITY. 


FT 


DISULFID 


89 


98 


BY 


SIMILARITY. 


FT 


DISULFID 


106 


117 


BY 


SIMILARITY. 


FT 


DISULFID 


111 


128 


BY 


SIMILARITY. 


FT 


DISULFID 


130 


139 


BY 


SIMILARITY. 


FT 


DISULFID 


145 


156 


BY 


SIMILARITY. 


FT 


DISULFID 


150 


165 


BY 


SIMILARITY. 


FT 


DISULFID 


167 


176 


BY 


SIMILARITY. 


FT 


DISULFID 


183 


194 


BY 


SIMILARITY. 


FT 


DISULFID 


188 


203 


BY 


SIMILARITY. 


FT 


DISULFID 


205 


214 


BY 


SIMILARITY. 


FT 


DISULFID 


221 


232 


BY 


SIMILARITY. 


FT 


DISULFID 


226 


242 


BY 


SIMILARITY. 


FT 


DISULFID 


244 


253 


BY 


SIMILARITY. 


FT 


DISULFID 


260 


271 


BY 


SIMILARITY. 


FT 


DISULFID 


265 


280 


BY 


SIMILARITY. 


FT 


DISULFID 


282 


291 


BY 


SIMILARITY. 


FT 


DISULFID 


298 


311 


BY 


SIMILARITY. 


FT 


DISULFID 


305 


320 


BY 


SIMILARITY. 


FT 


DISULFID 


322 


331 


BY 


SIMILARITY. 


FT 


DISULFID 


338 


349 


BY 


SIMILARITY. 


FT 


DISULFID 


343 


358 


BY 


SIMILARITY. 


FT 


DISULFID 


360 


369 


BY 


SIMILARITY. 


FT 


DISULFID 


375 


386 


BY 


SIMILARITY. 


FT 


DISULFID 


380 


397 


BY 


SIMILARITY. 


FT 


DISULFID 


399 


408 


BY 


SIMILARITY. 


FT 


DISULFID 


415 


428 


BY 


SIMILARITY. 


FT 


DISULFID 


422 


437 


BY 


SIMILARITY. 


FT 


DISULFID 


439 


448 


BY 


SIMILARITY. 


FT 


DISULFID 


455 


466 


BY 


SIMILARITY. 


FT 


DISULFID 


460 


475 


BY 


SIMILARITY. 


FT 


DISULFID 


477 


486 


BY 


SIMILARITY. 


FT 


DISULFID 


493 


504 


BY 


SIMILARITY. 


FT 


DISULFID 


498 


513 


BY 


SIMILARITY. 


FT 


DISULFID 


515 


524 


BY 


SIMILARITY. 


FT 


DISULFID 


531 


542 


BY 


SIMILARITY. 


FT 


DISULFID 


536 


551 


BY 


SIMILARITY. 


FT 


DISULFID 


553 


562 


BY 


SIMILARITY. 


FT 


DISULFID 


569 


579 


BY 


SIMILARITY. 


FT 


DISULFID 


574 


588 


BY 


SIMILARITY. 



FT 


DISULFID 


590 


599 


BY 


SIMILARITY . 


FT 


DISULFID 


606 


617 


BY 


SIMILARITY. 


FT 


DISULFID 


611 


626 


BY 


SIMILARITY. 


FT 


DISULFID 


628 


637 


BY 


SIMILARITY. 


FT 


DISULFID 


644 


654 


BY 


SIMILARITY. 


FT 


DISULFID 


649 


663 


BY 


SIMILARITY. 


FT 


DISULFID 


665 


674 


BY 


SIMILARITY. 


FT 


DISULFID 


681 


692 


BY 


SIMILARITY. 


FT 


DISULFID 


686 


701 


BY 


SIMILARITY. 


FT 


DISULFID 


703 


712 


BY 


SIMILARITY. 


FT 


DISULFID 


719 


729 


BY 


SIMILARITY. 


FT 


DISULFID 


724 


738 


BY 


SIMILARITY. 


FT 


DISULFID 


740 


749 


BY 


SIMILARITY. 


FT 


DISULFID 


756 


767 


BY 


SIMILARITY. 


FT 


DISULFID 


761 


776 


BY 


SIMILARITY. 


FT 


DISULFID 


778 


787 


BY 


SIMILARITY. 


FT 


DISULFID 


794 


805 


BY 


SIMILARITY. 


FT 


DISULFID 


799 


814 


BY 


SIMILARITY. 


FT 


DISULFID 


816 


825 


BY 


SIMILARITY. 


FT 


DISULFID 


832 


843 


BY 


SIMILARITY. 


FT 


DISULFID 


837 


854 


BY 


SIMILARITY. 


FT 


DISULFID 


856 


865 


BY 


SIMILARITY. 


FT 


DISULFID 


872 


883 


BY 


SIMILARITY. 


FT 


DISULFID 


877 


892 


BY 


SIMILARITY. 


FT 


DISULFID 


894 


903 


BY 


SIMILARITY. 


FT 


DISULFID 


910 


921 


BY 


SIMILARITY. 


FT 


DISULFID 


915 


930 


BY 


SIMILARITY. 


FT 


DISULFID 


932 


941 


BY 


SIMILARITY. 


FT 


DISULFID 


986 


997 


BY 


SIMILARITY. 


FT 


DISULFID 


991 


1006 


BY 


SIMILARITY.. 


FT 


DISULFID 


1008 


1017 


BY 


SIMILARITY. 


FT 


DISULFID 


1024 


1035 


BY 


SIMILARITY. 


FT 


DISULFID 


1029 


1044 


BY 


SIMILARITY. 


FT 


DISULFID 


1046 


1055 


BY 


SIMILARITY. 


FT 


DISULFID 


1062 


1073 


BY 


SIMILARITY. 


FT 


DISULFID 


1067 


1082 


BY 


SIMILARITY. 


FT 


DISULFID 


1084 


1093 


BY 


SIMILARITY. 


FT 


DISULFID 


1100 


1121 


BY 


SIMILARITY. 


FT 


DISULFID 


1115 


1130 


BY 


SIMILARITY. 



Query Match 20.0%; Score 719; DB 1; Length 2524; 

Best Local Similarity 25.9%; Pred. No. 3.6e-39; 

Matches 225; Conservative 60; Mismatches 221; Indels 364; Gaps 51 



Qy 5 LNSCLSFICL LLCHWIGTASPLNLED PNVCSHW ESYS 41 

: I I I I I I : I : : I I I : I : : I 

Db 603 INECLSKPCLNGGQCTDRENGYICTCPKGTTGVNCETKIDDCASNLCDNGKCIDKIDGYE 662 

Qy 42 VTVQESYPHPFDQIYYT SCTDILNWFKCTRHRVSYRTAYRHGEKTMYRR 9 0 

I : I I : I I : i I I 
Db 663 CTCEPGYTGKLCNININECDSNPCRNGGTCKDQINGFTCV 7 02 

Qy 91 KSQCCPGFYESGEMC VPHC-ADKCVHGRC IAPNTCQCEPGWGGTNC SSACD 140 

Ml i I I I : : I : I I I : I I I I I I : I I : : I : 

Db 7 03 CPDGYHD-HMCLSEVNECNSNPCIHGACHDGVNGYKCDCEAGWSGSNCDINNNECE 7 57 



Qy 



141 GDHWGPHCTSRCQCKNGALCNPITGA— CHCAAGFRGWRCEDRCEQGTYGNDCHQRCQCQ 19 8 



: I I I I : I I I I I I I I I I : I I : I I 

Db 758 SN PCMNGGTCKDMTGAYICTCKAGFSGPNCQ TNINECSSN-PCL 800 

Qy 199 NGATC-DHVTG-ECRCPPGYTGAFCEDLCPPGKHGPQCEQRC PCQNGGVCHH V 24 9 

I 11111:11 I I I I I I : I I 11:1111 

Db 801 NHGTCIDDVAGYKCNCMLPYTGAICEAVLAP CAGSPCKNGGRCKESEDFE 850 

Qy 250 TGECSCPSGWMGTVCGQPCPEGRFGKNCSQEC QCHNGGTCDAATG — QCHCSPGYTG 304 

I I I I I I I I : II I I I I I I : I : I I I I I I 

Db 851 TFSCECPPGWQGQTC EIDMNECVNRPCRNGATCQNTNGS YKCNCKPGYTG 900 

Qy 305 ERCQ DECPVGT YGVLCAETCQCVNGGKCYHVSGA — C L C E AG FAGE RC EARL 354 

I : I : I : I I I I I I I I I I I : I I : 

Db 901 RNCEMDIDDC QPNPCHNGGSCSDGINMFFCNCPAGFRGPKCEEDINECAS 950 

Qy 355 CPEGLYGIKCDKRCPCHLE NTHSCHPMSG ECAC 387 

I I I I I : I I I : I : I 11 

Db 951 NPCKNGANCTDCVNSYTCTCQPGFSGIHCESNTPDCTESSCFNGGTC — IDGINTFTCQC 1008 

Qy 38 8 KPGWSGLYC NE TCS PGFYGEACQQI CS CQ 416 

I I : : I I I II I I I : I I I : I I : 

Db 1009 PPGFTGSYCQHDINECDSKPCLNGGTCQDSYGTYKCTCPQGYTGLNCQNLVRWCDSSPCK 1068 

Qy 417 NGADCDSVTG — KCTCAPGFKGIDCSTP 442 

III : | | | : | : | | 

Db 1069 NGGKCWQTNNFYRCECKS GWTGVYCDVP S VS CEVAAKQQGVDI VHLCRN S GMCVDT GNTH 1128 

Qy 443 CPLGTYGINCSSR CG CKNDAVCSPVDG — SCTCKAGWHGVDCS 483 

I I I I : I hill: I I i I I I : I I I : I I 

Db 1129 FCRCQAGYTGSYCEEQVDECSPNPCQNGATCTDYLGGYSCECVAGYHGVNCSEEINECLS 1188 

Qy 484 IRCPSGTWGFGCNLT C QCLNGGACNTLDG 512 

I I I I I I : I : I I I I I 

Db 1189 HPCQNGGTCIDLINTYKCSCPRGTQGVHCEINVDDCTPFYDSFTLEPKCFNNGKCIDRVG 1248 

Qy 513 — TCTCAPGWRGEKCE LPCQD-GTYGLNCAE RCDC SH 54 6 

1111:11:11 II II II: I I : I I 

Db 1249 GYNCICPPGFVGERCEGDVNECLSNPCDSRGTQ — NCIQLVNDYRCECRQGFTGRRCESV 1306 

Qy 547 ADGC HPTTGH-CRCLPGWSGVHCD 569 

III : I I : I I I : I I : 

Db 1307 VDGCKGMPCRNGGTCAVASNTERGFICKCPPGFDGATCEYDSRTCSNLRCQNGGTCISVL 13 66 

Qy 570 SVCAEGRWGPNC SLPCY 586 

I I : I I I I I I I I 

Db 1367 TSSKCVCSEGYTGATCQYPVISPCASHPCY 1396 



RESULT 5 
NTC1_BRARE 

ID NTC1_BRARE STANDARD; PRT; 2 4 37 AA. 

AC P46530; 

DT 01-NOV-1995 (Rel. 32, Created) 

DT 01-NOV-1995 (Rel. 32, Last sequence update) 

DT 28-FEB-2003 (Rel. 41, Last annotation update) 

DE Neurogenic locus notch homolog protein 1 precursor. 

GN NOTCH1A OR NOTCH. 



OS Brachydanio rerio (Zebrafish) (Danio rerio) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 

OC Actinopterygii ; Neopterygii; Teleostei; Os tariophysi ; Cyprini formes ; 

OC Cyprinidae; Danio. 

OX NCBI_TaxID=7955; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE=Embryo; 

RX MEDLINE=94128602; PubMed=82977 91 ; 

RA Bierkamp C, Campos-Ortega J. A. ; 

RT "A zebrafish homologue of the Drosophila neurogenic gene Notch and 

RT its pattern of transcription during early embryogenesis . " ; 

RL Mech. Dev. 43:87-100(1993). 

CC -!- FUNCTION: Implicated in cell fate specifications during 
CC embryo development. May be involved in the formation of the 

CC neural plate, notochord and brain vesicles. 

CC -!- SUBCELLULAR LOCATION: Type I membrane protein. 

CC DEVELOPMENTAL STAGE: Expressed in all cells in pregas trulation 

CC stages. During gastrulation is differentially expressed, 

CC accumulating predominantly in the prechordal mesoderm and 

CC notochord. At the end of gastrulation, expressed along the 

CC anterior-posterior axis including the developing neural plate 

CC and differentiating mesoderm. Also present in the developing 

CC brain and head regions . 

CC -!- SIMILARITY: Belongs to the NOTCH family. 

CC -!- SIMILARITY: Contains 36 EGF-like domains. 

CC -!- SIMILARITY: Contains 3 Lin/Notch repeats. 

CC -!- SIMILARITY: Contains 6 ANK repeats. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb~sib.ch). 

CC 

DR EMBL; X69088; CAA48831.1; 

DR PIR; S42612; S42612. 

DR HSSP; P00740; 1EDM. 

DR ZFIN; ZDB-GENE-990415-173 ; notchla. 

DR InterPro; IPR002110; ANK. 

DR InterPro; I PRO 0 0 152; Asx_hydroxyl_S . 

DR InterPro; IPR000742; EGF_2 . 

DR InterPro; IPR001881; EGF_Ca. 

DR InterPro; IPR001438; EGF_II. 

DR InterPro; IPR006209; EGF_like. 

DR InterPro; IPR002049; Laminin_EGF. 

DR InterPro; IPR008297; Notch. 

DR InterPro; IPR000800; Notch_dom. 

DR Pfam; PF00023; ank; 6. 

DR Pfam; PF00008; EGF; 36. 

DR Pfam; PF00066; notch; 3. 

DR PIRSF; PIRSF002279; Notch; 1. 

DR PRINTS; PR00010; EGFBLOOD. 

DR PRINTS; PR00011; EGFLAMININ. 

DR PRINTS; PR01452; NOTCH. 
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DR 
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DR 


PROSITE; 
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Query Match 19.9%; Score 717; DB 1; Length 2437; 

Best Local Similarity 25.7%; Pred. No. 4.7e-39; 

Matches 221; Conservative 60; Mismatches 208; Indels 370; Gaps 49; 

Qy 4 SLNSCLS FICLLLCHWIGTASPLNLEDPNVCSHWESYSVTVQESY 4 8 

: : I I I I : I I I : I : : I I 

Db 601 NINECLSQPCRNGGTCQDRENAYICTCPKGTTGWCEINIDD CKR 645 

Qy 49 PHPFDQIYYTSCTDILNWFKCTRHRVSYRTAYRHGEKTMYRRKSQCCPGFYESGEMC 105 

I I I I I : I : : I I I I : I I I I 

Db 646 -KPCD YGKCIDKINGYECV CEPGY — SGSMCNIN 676 

Qy 10 6 VPHCA DKCVHGRC IAPNT 12 3 

: I I : I : I I I I 

Db 677 IDDCALNPCHNGGTCIDGVNSFTCLCPDGFRDATCLSQHNECSSNPCIHGSCLDQINSYR 736 

Qy 124 CQCEPGWGGTNCSSACDGDHWGPHCTSRCQCKNGALCNPITGA--CHCAAGFRGWRCEDR 181 

I I I I I I I I : I I I I I I : I I I I i I I I : 

Db 7 37 CVCEAGWMGRNCDININ ECLSN-PCVNGGTCKDMTSGYLCTCRAGFSGPNCQMN 789 

Qy 182 CEQGTYGNDCHQRCQCQNGATC-DHVTG-ECRCPPGYTGAFCEDLCPPGKHGPQCEQRCP 239 

I : I I I : I I I I : 1 I II I I I : : I ill 
Db 7 90 I NECASN- PCLNQGSCI DDVAGFKCNCMLPYTGEVCENVLAP CSPR-P 835 

Qy 24 0 CQNGGVCHH VTGECSCPSGWMGTVCGQPCPEGRFGKNCSQEC QCHNGGTCDAA 292 

I : I I I I I : I : I I : I I I I II Mill: 

Db 836 CKNGGVCRESEDFQSFSCNCPAGWQGQTCEVDI NECVRNPCTNGGVCENL 885 

Qy 293 TG--QCHCSPGYTGERCQ DECPVGT YGVLCAETCQCVNGGKCY-HVSG-ACLCEAGF 345 

I I I I : I I : I I I : hi I I I I I I hi hi Ml 

Db 886 RGGFQCRCNPGFTGALCENDIDDC EPNPCSNGGVCQDRVNGFVCVCLAGF 935 

Qy 346 AGERCEARL CPEGLYGIKCDKRCPCHLENTHSCHP- 380 

MM: | || | | | : | | : I I 



Db 936 RGERCAEDIDECVSAPCRNGGNCTDCVNSYTCSCPAGFSGINCEINTPDCTES — SCFNG 993 

Qy 381 MSGECACKPGWSGLYC NE TCSPGFYGEA 408 

I I I I I : : I I I II I I I : I 

Db 994 GTCVDGISSFSCVCLPGFTGNYCQHDVNECDSRPCQNGGSCQDGYGTYKCTCPHGYTGLN 1053 

Qy 409 CQQI CS CQNGADC--DSVTGKCTCAPGFKGIDCSTP 442 

II: I I : I I I : I I I I : I I I I 

Db 1054 CQSLVRWCDSSPCKNGGSCWQQGASFTCQCASGWTGIYCDVPSVSCEVAARQQGVSVAVL 1113 

Qy 443 CPLGTYGINCSSRCG CKNDAVCSPVDG — SCTCKAGW 477 

I I I I : hill: I III h 

Db 1114 CRHAGQCVDAGNTHLCRCQAGYTGSYCQEQVDECQPNPCQNGATCTDYLGGYSCECVPGY 1173 

Qy 478 HGVDCS IRCPSGTWGFGCNL TC 499 

I |: : I I I I I I I I : I 

Db 1174 HGMNCSKEINECLSQPCQNGGTCIDLVNTYKCSCPRGTQGVHCEIDIDDCSPSVDPLTGE 1233 

Qy 500 -QCLNGGACNTLDG — TCTCAPGWRGEKCE LPCQ-DGTYGLNCAE RC 542 

: I I I I I I I I h I h I I II hill: M 

Db 1234 PRCFNGGRCVDRVGGYGCVC PAGFVGERCEGDVNECLSDPCDPSGS Y — NCVQLINDFRC 1291 

Qy 543 DCSHA DGCHPT TGH CRCLPGWSGVHCD 569 

: I : I I I I I h I I I : I I I : 

Db 1292 ECRTGYTGKRCETVFNGCKDTPCKNGGTCAVASNTKHGYICKCQPGYSGSSCEYDSQSCG 1351 

Qy 57 0 SVCAEGRWGPNC 581 

: I I I I 
Db 1352 S L RC RN GAT CVS GH L S P RC 1370 



RESULT 6 
NTC3_HUMAN 

ID NTC3_HUMAN STANDARD; PRT; 2 321 AA. 

AC Q9UM47; Q9UEB3; Q9UPL3; Q9Y6L8; 

DT 28-FEB-2003 (Rel. 41, Created) 

DT 28-FEB-2003 (Rel. 41, Last sequence update) 

DT 28-FEB-2003 (Rel. 41, Last annotation update) 

DE Neurogenic locus notch homolog protein 3 precursor (Notch 3) . 

GN NOTCH3. 

OS Homo sapiens (Human) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 

OC Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

OX NCBI_TaxID=9 606; 

RN [1] 

RP SEQUENCE FROM N . A. 

RX MEDLINE=97032728; PubMed-8 8 7 8 4 7 8 ; 

RA Joutel A., Corpechot C, Ducros A., Vahedi K. , Chabriat H. , Mouton P., 

RA Alamowitch S., Domenga V., Cecillion M. , Marechal E . , Maciazek J., 

RA Vayssiere C, Cruaud C, Cabanis E.-A., Ruchoux M.M., Weissenbach J., 

RA Bach J.-F., Bousser M.-G., Tournier-Las s erve E. ; 

RT "Notch3 mutations in CADASIL, a hereditary adult-onset condition 

RT causing stroke and dementia. "; 

RL Nature 383:7 07-710(1996). 

RN [2] 

RP SEQUENCE FROM N . A. 

RA Gunel M. , Artavanis-Tsakonas S.; 



RL Submitted (APR-1998) to the EMBL/GenBank/DDBJ databases. 

RN [3] 

RP SEQUENCE FROM N.A. 

g RA Lamerdin J.E., McCready P.M., Skowronski E., Adamson A.W., 

RA Burkhart-Schultz K., Gordon L . f Kyle A., Ramirez M . , Stilwagen S., 

RA Phan H., Velasco N., Games J., Danganan L., Poundstone P., 

RA Christensen M. , Georgescu A., Avila J., Liu S., Attix C, Andreise T . , 

RA Trankheim M. , Ami co-Keller G., Coefield J., Duarte S., Lucas S., 

RA Bruce R., Thomas P., Quan G., Kronrniller B., Arellano A., 

^ RA Montgomery M. , Ow D . , Nolan M. , Trong S., Kobayashi A., Olsen A.S., 

' RA Carrano A. V. ; 

RT "Sequence analysis of an 1.5 Mb olfactory receptor (OLFR) cluster in 

RT 19pl3.1."; 

^ RL Submitted (MAY-1998) to the EMBL/GenBank/DDBJ databases. 

RN [4] 

RP VARIANTS CADASIL TYR-49; CYS-71; CYS-90; CYS-110; CYS-133; CYS-141; 

RP ARG-146; CYS-153; CYS-169; CYS-171; CYS-182; ARG-185; SER-212; 

RP GLY-222; TYR-224; CYS-258; TYR-542; CYS-558; CYS-578; CYS-728; 

RP CYS-985; CYS-1006; CYS-1031; CYS-1231 AND ARG-12 61, AND VARIANTS 

RP ARG-170; LEU-496; GLN-1133; MET-1183 AND ALA-2223. 

RX MEDLINE=9 8 04 97 53; PubMed=93 8 8 3 9 9 ; 

RA Joutel A., Vahedi K. , Corpechot C., Troesch A., Chabriat H. , 

RA Vayssiere C, Cruaud C, Maciazek J. , Weissenbach J., Bousser M.-G., 

RA Bach J.-F., Tournier-Las serve E . ; 

RT "Strong clustering and stereotyped nature of Notch3 mutations in 

RT CADASIL patients."; 

RL Lancet 350:1511-1515(1997). 

RN [5] 

RP VARIANT CADASIL 114-GLY--PRO-12 0 DEL. 

RX MEDLINE-2 02 64 4 7 3 ; PubMed=l 0 8 02 8 07; 

RA Joutel A., Chabriat H., Vahedi K. , Domenga V., Vayssiere C, 

RA Ruchoux M.M., Lucas C, Leys D. , Bousser M.-G., Tournier-Las serve E . ; 

RT "Splice site mutation causing a seven amino acid Notch3 in-frame 

RT deletion in CADASIL."; 

RL Neurology 54:1874-1875(2000). 

RN [6] 

RP IDENTIFICATION OF LIGANDS. 

RX MEDLINE=9 918 07 65; PubMed^l 0 07 92 5 6 ; 

RA Gray G.E., Mann R.S., Mitsiadis E . , Henrique D. f Carcangiu M.-L., 

RA Banks A. , Leiman J., Ward D., Ish-Horowitz D., Artavanis-Tsakonas S.; 

i RT "Human ligands of the Notch receptor."; 

RL Am. J. Pathol. 154:785-794(1999). 

CC -!- FUNCTION: Functions as a receptor for membrane-bound ligands 
CC Jaggedl, Jagged2 and Deltal to regulate cell-fate determination. 

CC Upon ligand activation through the released notch intracellular 

CC domain (NICD) it forms a transcriptional activator complex with 

CC RBP- J kappa and activates genes of the enhancer of split locus . 

CC Affects the implementation of differentiation, proliferation and 

CC apoptotic programs (By similarity) . 

CC -!- SUBUNIT: Heterodimer of a C-terminal fragment N(TM) and a N- 
CC terminal fragment N(EC) which are probably linked by disulfide 

CC bonds (By similarity) . 

CC -■!- SUBCELLULAR LOCATION: Type I membrane protein. Following 
CC proteolytical processing NICD is translocated to the nucleus. 

CC -!- TISSUE SPECIFICITY: Ubiquitously expressed in fetal and adult 
CC tissues . 

CC -!- PTM : Synthesized in the endoplasmic reticulum as an inactive form 



CC which is proteolytically cleaved by a furin-like convertase in the 

CC trans-Golgi network before it reaches the plasma membrane to yield 

CC an active, ligand-accessible form. Cleavage results in a C- 

CC terminal fragment N(TM) and a N-terminal fragment N(EC). Following 

CC ligand binding, it is cleaved by TNF-alpha converting enzyme 

CC (TACE) to yield a membrane-associated intermediate fragment called 

CC notch extracellular truncation (NEXT) . This fragment is then 

CC cleaved by presenilin dependent gamma-secretase to release a 

CC notch-derived peptide containing the intracellular domain (NICD) 

CC from the membrane (By similarity) . 

CC -!- PTM: Phosphorylated (By similarity). 

CC DISEASE: Defects in N0TCH3 are associated with cerebral autosomal 

CC dominant arteriopathy with subcortical infarcts and 

CC leukoencephalopathy ( C ADAS I L ) [MIM: 125310] . CADASIL causes a type 

CC of stroke and dementia of which key features include recurrent 

CC subcortical ischemic events and vascular dementia. 

CC -!- SIMILARITY: Belongs to the NOTCH family. 

CC -!- SIMILARITY: Contains 34 EGF-like domains. 

CC -!- SIMILARITY: Contains 3 Lin/Notch repeats. 

CC -!- SIMILARITY: Contains 5 ANK repeats. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformati.es and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; U97669; AAB91371.1; -. 

DR EMBL; AF058900; AAC14346.1; -. 

DR EMBL; AF058881; AAC14346.1; JOINED. 

DR EMBL; AF058882; AAC14346.1; JOINED. 

DR EMBL; AF058883; AAC14346.1; JOINED. 

DR EMBL; AF058884; AAC14346.1; JOINED. 

DR EMBL; AF058885; AAC14346.1; JOINED. 

DR EMBL; AF058886; AAC14346.1; JOINED. 

DR EMBL; AF058887; AAC14346.1; JOINED. 

DR EMBL; AF058888; AAC14346.1; JOINED. 

DR EMBL; AF058889; AAC14346.1; JOINED. 

DR EMBL; AF058890; AAC14346.1; JOINED. 

DR EMBL; AF058891; AAC14346.1; JOINED. 

DR EMBL; AF058892; AAC14346.1; JOINED. 

DR EMBL; AF058893; AAC14346.1; JOINED. 

DR EMBL; AF058894; AAC14346.1; JOINED. 

DR EMBL; AF058895; AAC14346.1; JOINED. 

DR EMBL; AF058896; AAC14346.1; JOINED. 

DR EMBL; AF058897; AAC14346.1; JOINED. 

DR EMBL; AF058898; AAC14346.1; JOINED. 

DR EMBL; AF058899; AAC14346.1; JOINED. 

DR EMBL; AC004257; AAC04897.1; 

DR EMBL; AC004663; AAC157 89.1; ALT_INIT . 

DR PIR; S78549; S78549. 

DR HSSP; P00740; 1EDM. 

DR Genew; HGNC:78 83; NOTCH 3 . 

DR MIM; 600276; -. 

DR MIM; 125310; -. 
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25. 






DR 


PROSITE; 


PS50026; 


EGF 3; 


34. 
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12, CALCIUM-BINDING (POTENTIAL 


FT 


DOMAIN 


507 


543 




EGF-LIKE 


13, CALCIUM-BINDING (POTENTIAL 


FT 


DOMAIN 


545 


580 




EGF-LIKE 


14, CALCIUM-BINDING (POTENTIAL 


FT 


DOMAIN 


582 


618 




EGF-LIKE 


15, CALCIUM-BINDING (POTENTIAL 


FT 


DOMAIN 


620 


655 




EGF-LIKE 


16, CALCIUM-BINDING (POTENTIAL 


FT 


DOMAIN 


657 


693 




EGF-LIKE 


17, CALCIUM-BINDING (POTENTIAL 


FT 


DOMAIN 


695 


730 




EGF-LIKE 


18. 



FT DOMAIN 734 770 EGF-LIKE 19. 

FT DOMAIN 771 808 EGF-LIKE 20. 

FT DOMAIN 810 847 EGF-LIKE 21, CALCIUM-BINDING (POTENTIAL) . 

FT DOMAIN 849 885 EGF-LIKE 22, CALCIUM-BINDING (POTENTIAL) . 

FT DOMAIN 887 922 EGF-LIKE 23, CALCIUM-BINDING (POTENTIAL). 

Query Match 19.4%; Score 697; DB 1; Length 2321; 

Best Local Similarity 25.2%; Pred. No. 8.7e-38; 

Matches 226; Conservative 61; Mismatches 249; Indels 360; Gaps 51; 

jNSCLS FICLLLCHWIGTASPLNLED PNVC-SHWESY 4 0 

I I I I | | : : : | | : : : : : I I : 



I I : I : I I I I 1 I : : I i I 

SCTCPSGFSGSTCOLDVDECASTPCRNGAKCVDQPDGY ECRCAEGF 537 



I : I I I : i I I I II II : I I I I : I I I I I I I : 

-EGTLCDRNVDDCSPDPCHHGRCVDGIASFSCACAPGYTGTRCESQVD ECRSQ 5 89 



I : : I I : I I : I I I I II I : I I I I I I I 



-CDHVTGECRCPPGYTGAFCED LCPPGKHGPQC EQRC PCQNGG 244 

I : II I III I I I I I II I I I : I 



VCHHVTG — ECSCPSGWMGTVCGQ PCPEGRFGK 275 

: I : I I I I I I I I Mil: 



Qy 


5 


Db 


432 


Qy 


41 


Db 


492 


Qy 


99 


Db 


538 


Qy 


152 


Db 


590 


Qy 


201 


Db 


649 


Qy 


245 


Db 


708 


Qy 


276 


Db 


768 


Qy 


321 


Db 


827 


Qy 


353 


Db 


886 


Qy 


382 


Db 


939 


Qy 


411 


Db 


999 


Qy 


443 


Db 


1059 



-QCHNGGTCDAATGQ CHCSPGYTGERCQ DEC PVGTYGVLC 320 

I : I I I : : I I I II hill III I I : I : I 



-TCQ CVNGGKCYHVSG — ACLCEAGFAGERCEA 352 

II I : I I I I 1:111111111 



-LCPEGLYGIKCDKRCPCHLENTHSCHPM 381 

I I I I I : : I II 
:TCPPGYGGFHCEQDLP DCSPSSCFNGGT 9 38 



I I I : I I : = I : I II I I I I 



-CS CQNGADCDSVTGKCTCAPGFKGIDC STP 4 42 

II Mill lllhll I I 



-CPLGTYGINCSSRCG CKNDAVCSPVDGS — CTCKAGWHGVD 481 

I I I I : I I : : I I | | | : : I : 



Qy 482 CS IRCPSGTWGFGCNLT C QCLN 503 

I I I I I I I : I : I I : 

Db 1119 CEDDVDECASQPCQHGGSCIDLVARYLCSCPPGTLGVLCEINEDDCGPGPPLDSGPRCLH 117 8 

Qy 504 GGACNTLDG--TCTCAPGWRGEKCEL PCQDGTYGLNCAERCDCSHADGCHPTTG 555 

I I 1 I I I I I I : I : I I I = I I : I I I 

Db 1179 NGTCVDLVGGFRCTCPPGYTGLRCEADINECRSGA CHAAHTRDCLQDPGGGF 1230 

Qy 556 HCRCLPGWSGVHCDSV CAEGRWGPNC 581 

I I I : I I I : I Ihllll 
Db 1231 RCLCHAGFSGPRCQTVLSPCESQPCQHGGQCRPSPGPGGGLTFTCHCAQPFWGPRC 1286 



RESULT 7 
NTCl_MOUSE 

ID NTC1_M0USE STANDARD; PRT; 2531 AA. 

AC Q01705; Q06007; Q61905; Q99JC2; Q9QW58; Q9R0X7; 

DT 01-NOV-1995 (Rel. 32, Created) 

DT 01-FEB-1996 (Rel. 33, Last sequence update) 

DT 10-OCT-2003 (Rel. 42, Last annotation update) 

DE Neurogenic locus notch homolog protein 1 precursor (Notch 1) {Motch A) 

DE (mT14) (p300) . 

GN NOTCH1 OR MOTCH. 

OS Mus musculus (Mouse) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

OX NCBI_TaxID=10090; 

RN [1] 

RP SEQUENCE FROM N.A. (ISOFORM 1) . 

RC TISSUE=Embryo; 

RX MEDLINE= 93194170; PubMed= 8 4 4 9 4 8 9; 

RA Franco del Amo F . , Gendron-Maguire M., Swiatek P. J., Jenkins N.A., 

RA Copeland N.G., Gridley T.; 

RT "Cloning, analysis, and chromosomal localization of Notch-1, a mouse 

RT homolog of Drosophila Notch."; 

RL Genomics 15:259-2 64(1993). 

RN [2] 

RP SEQUENCE OF 731-1899 FROM. N.A. (ISOFORM 2), AND DEVELOPMENTAL STAGE. 

RC STRAIN=CD-1; TI S SUE=Embryo ; 

RX MEDLINE=930508 01; PubMed=142 664 4 ; 

RA Reaume A.G., Conlon R.A. , Zirngibl R., Yamaguchi T.P., Rossant J.; 

RT "Expression analysis of a Notch homologue in the mouse embryo."; 

RL Dev. Biol. 154:377-387(1992). 

RN [3] 

RP SEQUENCE OF 1551-1647 FROM N.A. (ISOFORM 1), AND DEVELOPMENTAL STAGE. 

RC TISSUE=Embryo; 

RX MEDLINE=93 04 8 8 35; PubMed=1425352 ; 

RA Franco del Amo F. , Smith D.E., Swiatek P.J., Gendron-Maguire M. , 

RA Greenspan R.J., McMahon A. P., Gridley T.; 

RT "Expression pattern of Motch, a mouse homolog of Drosophila Notch, 

RT suggests an important role in early pos timplantation mouse 

RT development."; 

RL Development 115:737-744(1992). 

RN [4] 

RP SEQUENCE OF 1161-1547 FROM N.A. 

RC STRAIN=C57BL/6 X CBA; TI SSUE=Embryo ; 



RX MEDLINE=93178563; PubMed=84 4 0332 ; 

RA Lardelli M. , Lendahl U.; 

RT "Motch A and Motch B-two mouse Notch homologues coexpressed in a 

RT wide variety of tissues."; 

RL Exp. Cell Res. 204:364-372(1993). 

RN [5] 

RP SEQUENCE OF 1659-1673 FROM N.A. 

RX MEDLINE=993644 99; PubMed=104 377 8 8 ; 

RA Lee J.S., Ishimoto A., Yanagawa S.I.; 

RT "Murine leukemia provirus-mediated activation of the Notchl gene leads 

RT to induction of HES-1 in a mouse T lymphoma cell line, DL-3."; 

RL FEBS Lett. 4 55:276-280(1999). 

RN [6] 

RP SEQUENCE OF 1950-2201 FROM N.A. 

RX MEDLINE=9 8 02 9 4 96; PubMed=9 3 84671; 

RA Messerle M. , Folio M. , Nehls M. , Eggert H. , Boehm T . ; 

RT "Dynamic changes in gene expression during in vitro differentiation of 

RT mouse embryonic stem cells."; 

RL Cytokines Cell. Mol . Ther. 1:13 9-143(1995). 

RN [7] 

RP SEQUENCE OF 1655-1659, CLEAVAGE BY FURIN-LIKE CONVERTASE, AND 

RP MUTAGENESIS OF 1651-ARG— ARG-1654 . 

RX MEDLINE=9 8 318 619; PubMed=9 653 1 4 8 ; 

RA Logeat F. , Bessia C. , Brou C, LeBail O., Jarriault S., Seidah N.G., 

RA Israel A. ; 

RT "The Notchl receptor is cleaved cons titutively by a furin-like 

RT convertase . " ; 

RL Proc. Natl. Acad. Sci. U.S.A. 95:8108-8112(1998). 

RN [8] 

RP PARTIAL SEQUENCE, AND POST-TRANS LAT I ONAL, PROCESSING. 

RX MEDLINE=2152 3 95 6; PubMed=l 151 8 7 1 8 ; 

RA Saxena M.T., Schroeter E.H., Mumm J.S., Kopan R. ; 

RT "Murine notch homologs (Nl-4) undergo presenilin-dependent 

RT proteolysis."; 

RL J. Biol. Chem. 276:40268-40273(2001). 

RN [9] 

RP POST -TRANSNATIONAL PROCESSING. 

RX MEDLINE-2137 4 37 6; PubMed=11459 94 1 ; 

RA Mizutani T . , Taniguchi Y., Aoki T. , Hashimoto N., Honjo T.; 

RT "Conservation of the biochemical mechanisms of signal transduction 

RT among mammalian Notch family members."; 

RL Proc. Natl. Acad. Sci. U.S.A. 98:9026-9031(2001). 

RN [10] 

RP INTERACTION WITH DTXl AND DTX2 . 

RX MEDLINE-21123790; PubMed=1122 67 52 ; 

RA Kishi N., Tang Z., Maeda Y., Hirai A. , Mo R., Ito M. , Suzuki S., 

RA Nakao K., Kinoshita T . r Kadesch T., Hui C.-C, Artavanis -Ts akonas S., 

RA Okano H., Matsuno K. ; 

RT "Murine homologs of deltex define a novel gene family involved in 

RT vertebrate Notch signaling and neurogenesis."; 

RL Int. J. Dev. Neurosci. 19:21-35(2001). 

CC -!- FUNCTION: Functions as a receptor for membrane-bound ligands 
CC Jaggedl, Jagged2 and Deltal to regulate cell-fate determination. 

CC Upon ligand activation through the released notch intracellular 

CC domain (NICD) it forms a transcriptional activator complex with 

CC RBP-J kappa and activates genes of the enhancer of split locus. 

CC Affects the implementation of di f f erentiation, proliferation and 



CC apoptotic programs (By similarity) . May play an essential role in 

CC postimplantation development, probably in some aspect of cell 

CC specification and/or differentiation. May be involved in mesoderm 

CC development, somite formation and neurogenesis. Involved in the 

CC maturation of both CD4 + and CD8+ cells in the thymus. 

CC -!- SUBUNIT: Heterodimer of a C-terminal fragment N(TM) and a N~ 

CC terminal fragment N(EC) which are probably linked by disulfide 

CC bonds. Interacts with DTX1 and DTX2 . 

CC -!- SUBCELLULAR LOCATION: Type I membrane protein. Following 

CC proteolytical processing NICD is translocated to the nucleus. 

CC -!- ALTERNATIVE PRODUCTS: 

CC Event=Alternative splicing; Named isoforrns=2; 

CC Name=l; 

CC IsoId=Q017 05-l; Sequence=Displayed ; 

CC Name=2; 

CC IsoId=Q01705-2; Sequence=VSP_0014 02 , VSP__001403, VSP_001404; 

CC Note^No experimental confirmation available; 

CC -!- TISSUE SPECIFICITY: Highly expressed in the brain, lung and 
CC thymus. Expressed at lower levels in the spleen, bone-marrow, 

CC spinal cord, eyes, mammary gland, liver, intestine, skeletal 

CC muscle, kidney and heart. 

CC -!- DEVELOPMENTAL STAGE: First detected in the mesoderm at 7.5 dpc By 

CC 8.5 dpc highly expressed in presomitic mesoderm, mesenchyme and 

CC endothelial cells, while much lower levels are seen in the 

CC neuroepithelium. Between 9.5-10.5 dpc expressed at high levels in 

CC the neuroepithelium. At 13.5 dpc expressed in the surface 

CC ectoderm, eye and developing whisker follicles. 

CC -!- PTM: Synthesized in the endoplasmic reticulum as an inactive form 

CC which is proteolytically cleaved by a furin-like convertase in the 

CC trans-Golgi network before it reaches the plasma membrane to yield 

CC an active, ligand-acces sible form. Cleavage results in a C- 

CC terminal fragment N(TM) and a N-terminal fragment N(EC). Following 

CC ligand binding, it is cleaved by TNF-alpha converting enzyme 

CC (TACE) to yield a membrane-associated intermediate fragment called 

CC notch extracellular truncation (NEXT) . This fragment is then 

CC cleaved by presenilin dependent gamma-secretase to release a 

CC notch-derived peptide containing the intracellular domain (NICD) 

CC from the membrane. 

CC -!- PTM: Phosphorylated. 

CC -!- SIMILARITY: Belongs to the NOTCH family. 

CC -!- SIMILARITY: Contains 36 EGF-like domains. 

CC -!- SIMILARITY: Contains 3 Lin/Notch repeats. 

CC -!- SIMILARITY: Contains 5 ANK repeats. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch ) . 

CC 

DR EMBL; Z11886; CAA77941.1; 

DR EMBL; L02613; AAK14898.1; 

DR EMBL; X68278; CAA48339.1; 

DR EMBL; AJ238029; CAB40733.1; 

DR EMBL; X82562; CAA57909.1; 



DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
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DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
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DR 
DR 
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DR 
KW 
KW 
KW 
KW 
FT 
FT 
FT 
FT 
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PIR; A46019; A46019. 
PIR; B49175; B49175. 
HSSP; P00740; 1EDM . 
MGD; MGI: 97363; Notchl. 

GO; GO: 0005887; C:integral to plasma membrane; IC. 

GO; GO: 0005515; F:protein binding; IPI . 

GO; GO:0030154; P:cell differentiation; IMP . 

GO; GO: 0007386; P : compartment specification; IMP. 

GO; GO: 0007219; P:N signaling pathway; IC. 

GO; GO: 0045944; P:positive regulation of transcription from P. 

InterPro; IPR002110; ANK . 

InterPro; IPR000152; Asx_hydroxyl_S . 

InterPro; IPR000742; EGF_2 . 

EGF_Ca . 
EGF_II . 
EGF^like. 
Laminin EGF. 



IDA. 



IPR001881; 
IPR001438; 
IPR006209; 
IPR002049; 



IPR008297; Notch. 
IPR000800; Notch dom. 



InterPro ; 
InterPro ; 
InterPro ; 
InterPro ; 
InterPro ; 
InterPro ; 
Pfam; PF00023; ank; 7. 
Pfam; PF00008; EGF; 35. 
Pfam; PF00066; notch; 3. 
PIRSF; PIRSF002279; Notch; 1. 
PRINTS; PR00010; EGFBLOOD . 
PRINTS; PR00011; EGFLAMININ. 
PRINTS; PR01452; NOTCH. 
SMART; SM00248; ANK; 6. 
SMART; SM0 017 9; EGF_CA; 24. 
SMART; SM00004; NL; 2. 

PROSITE; PS50297; ANK_REP_REGION ; 1. 
PROSITE; PS5008 8; ANK_REPEAT; 2. 
PROSITE; PS00010; ASX_HYDROXYL; 22. 
PROSITE; PS00022; EGF_1; 34. 
PROSITE; PS01186; EGF_2 ; 27. 
PROSITE; PS50026; EGF_3 ; 36. 
PROSITE; PS01187; EGF_CA; 21. 

Receptor; Transcription regulation; Activator; Differentiations- 
Developmental protein; Repeat; ANK repeat; EGF-like domain; 
Transmembrane; Glycoprotein; Signal ; Phosphorylation; 



Alternative splicing. 



SIGNAL 


1 


18 


CHAIN 


19 


2531 


CHAIN 


1711 


2531 


CHAIN 


1744 


2531 


DOMAIN 


19 


1725 



POTENTIAL. 

NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN 1. 
NOTCH EXTRACELLULAR TRUNCATION. 
NOTCH INTRACELLULAR DOMAIN. 
EXTRACELLULAR (POTENTIAL) . 



Qy 

Db 

Qy 

Db 



Query Match 19.2%; Score 693; DB 1; Length 2531; 

Best Local Similarity 25.3%; Pred. No. 1.7e-37; 

Matches 217; Conservative 71; Mismatches 206; Indels 364; Gaps 51; 

10 SFICLLLCHWIGTASPLNLEDPNVCSHWESYSVTVQESYPHPFDQIYYTSCTDILNWFKC 69 
i : : I I I I : II : I I : Ml : | | : : : : I 

624 SYLCLCLKGTTGPNCEINLDD CA SNPCDS GTCLDKIDGYEC 664 

7 0 TRHRVS YRTAYRHGEKTMYRRKSQCC P GF YE S GEMC VPHCA 110 

III: : I I I : II 
665 A CEPGY—TGSMCNVNIDECAGSPCHNGGTCEDGIAG 699 



Qy HI DKCVHGRC IAPNTCQCEPGWGGTNC SSACDG 141 

: I : I I I : I I 1 I 1 I I I I : : I : 

Db 700 FTCRCPEGYHDPTCLSEVNECNSNPCIHGACRDGLNGYKCDCAPGWSGTNCDINNNECES 759 

Qy 142 DHWGPHCTSRCQCKNGALCNPITG--ACHCAAGFRGWRCEDRCEQGTYGNDCHQRCQCQN 199 

: 1111:1 I I I I I I : I I : I I I 

Db 760 N PCVNGGTCKDMTSGYVCTCREGFSGPNCQ TNINECASN-PCLN 8 02 

Qy 200 GATC-DHVTG-ECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPCQNGGVC HHVTGEC 2 53 

I | I | I : I I I I I I I I I : I I I I : I I I I : : I 

Db 803 QGTCIDDVAGYKCNCPLPYTGATCEWLAP C-ATSPCKNSGVCKESEDYESFSC 855 

Qy 254 SCPSGWMGTVC GQPCPEGR FGKNCS QECQ ■ 2 82 

I I : II I I Ml I : I I : h 

Db 856 VCPTGWQGQTCEVDINECVKSPCRHGASCQNTNGSYRCLCQAGYTGRNCESDIDDCRPNP 915 

Qy 283 CHNGGTCDAA — TGQCHCSPGYTGERCQDE CPVGT 315 

M I I I : I | | | | | : | | :: : I I I I 

Db 916 CHNGGSCTDG1NTAFCDCLPGFQGAFCEEDINECASNPCQNGANCTDCVDSYTCTCPVGF 97 5 

Qy 316 YGVLCAET CQCVNGGKCYHVSG ACLCEAGFAGERCEARLCPEGLYGI-KC 364 

I : I I I I I I I I I I I I I I I : I : : I 

Db 97 6 NGIHCENNTPDCTESSCFNGGTC — VDGINSFTCLCPPGFTGSYCQ YDVNEC 1025 

Qy 365 DKRCPCHLENTHSCHPMSG— ECACKPGWSGLYCNE TCSPGFYGEACQQI 412 

I I II II I : I I I : : I I I : I III 

Db 1026 DSR-PCLHGGT--CQDSYGTYKCTCPQGYTGLNCQNLVRWCDSAPCKNGGRCWQTNTQYH 1082 

Qy 413 CSCQN GADCDSVTGKCTCAPGFKGIDCSTPCPLGTYGIN CSSRCG 457 

II:: I : I I s : I I : I I I : I I : : I : I 

Db 1083 CECRSGWTGVNCDVLSVSCEVAAQKRGIDVTLLCQHGGLCVDEGDKHYCHCQAGYTGSYC 1142 

Qy 458 CKNDAVCSPVDG--SCTCKAGWHGVDCS 483 

I : I I I : 111111:11:11 
Db 1143 EDEVDECSPNPCQNGATCTDYLGGFSCKCVAGYHGSNCSEEINECLSQPCQNGGTCIDLT 1202 

Qy 484 IRCPSGTWGFGCNLT C QCLNGGACNTLDG — TCTCAPGWRGE 523 

I I I I I I : I : I I I I I I I II I I : I I 

Db 1203 NSYKCSCPRGTQGVHCEINVDDCHPPLDPASRSPKCFNNGTCVDQVGGYTCTCPPGFVGE 1262 

Qy 524 KCE LPCQD-GTYGLNCAER CDC SHADGC 550 

: I I II II I I : I I : i I = I I 

Db 1263 RCEGDVNECLSNPCDPRGTQ — NCVQRWDFHCECRAGHTGRRCESVINGCRGKPCKNGG 132 0 

Qy 551 HPTTGH-CRCLPGWSGVHCDS VCAEGRWGPNC 581 

: I I II I : I I : : fill 
Db 1321 VCAVASNTARGFICRCPAGFEGATCENDARTCGSLRCLNGGTCISGPRSPTCLCLGSFTG 1380 

Qy 582 SLPCY 586 

I I I I 

Db 1381 PECQFPASSPCVGSNPCY 1398 



RESULT 8 
NTC3_MOUSE 

ID NTC3 MOUSE STANDARD; PRT; 2318 AA. 



AC Q61982; 

DT 01-NOV-1997 (Rel. 35, Created) 

DT 01-NOV-1997 (Rel. 35, Last sequence update) 

DT 28-FEB-2003 (Rel. 41, Last annotation update) 

DE Neurogenic locus notch hornolog protein 3 precursor (Notch 3) . 

GN N0TCH3 . 

OS Mus musculus (Mouse) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

OX NCBI_TaxID=10 09 0; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=ICR X Swiss Webster; 

RX MEDLINE=95001556; PubMed=7 918 097 ; 

RA Lardelli M. , Dalstrand J., Lendahl U-; 

RT "The novel Notch homologue mouse Notch 3 lacks specific epidermal 

RT growth factor-repeats and is expressed in proliferating 

RT neuroepithelium. " ; 

RL Mech. Dev. 4 6:123-136(1994). 

RN [2] 

RP POST-TRANSLATIONAL PROCESSING, AND MUTAGENESIS OF MET-1664. 

RX MEDLINE=21523956; PubMed-1 15 1 87 1 8 ; 

RA Saxena M.T., Schroeter E.H., Mumm U.S., Kopan R. ; 

RT "Murine notch homologs (Nl-4) undergo presenilin-dependent 

RT proteolysis . " ; 

RL J. Biol. Chem. 276:40268-4 0273(2 001). 

RN [3] 

RP POST-TRANSLATIONAL PROCESSING. 

RX MEDLINE=2137 4 37 6; PubMed=l 14 5 994 1 ; 

RA Mizutani T . , Taniguchi Y. , Aoki T . , Hashimoto N . , Honjo T.; 

RT "Conservation of the biochemical mechanisms of signal transduction 

RT among mammalian Notch family members."; 

RL Proc. Natl. Acad. Sci. U.S.A. 98:9026-9031(2001). 

CC -!- FUNCTION: Functions as a receptor for membrane-bound ligands 

CC Jaggedl, Jagged2 and Deltal to regulate cell-fate determination. 

CC Upon ligand activation through the released notch intracellular 

CC domain (NICD) it forms a transcriptional activator complex with 

CC RBP-J kappa and activates genes of the enhancer of split locus. 

CC Affects the implementation of differentiation, proliferation and 

CC apoptotic programs (By similarity) . May play a role during CNS 

CC development . 

CC -!- SUBUNIT: Heterodimer of a C-terminal fragment N(TM) and a N- 

CC terminal fragment N(EC) which are probably linked by disulfide 

CC bonds . 

CC -!- SUBCELLULAR LOCATION: Type I membrane protein. Following 

CC proteolytical processing NICD is translocated to the nucleus. 

CC -!- TISSUE SPECIFICITY: Proliferating neuroepithelium. 

CC -!- DEVELOPMENTAL STAGE: CNS development. 

CC PTM: Synthesized in the endoplasmic reticulum as an inactive form 

CC which is proteolytically cleaved by a furin-like convertase in the 

CC trans-Golgi network before it reaches the plasma membrane to yield 

CC an active, ligand-accessible form. Cleavage results in a C- 

CC terminal fragment N (TM) and a N-terminal fragment N(EC). Following 

CC ligand binding, it is cleaved by TNF-alpha converting enzyme 

CC (TACE) to yield a membrane-associated intermediate fragment called 

CC notch extracellular truncation (NEXT) . This fragment is then 

CC cleaved by presenilin dependent gamma-secretase to release a 
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notch-derived peptide containing the intracellular domain (NICD) 
from the membrane. 
PTM: Phosphor yl at ed. 



SIMILARITY 
SIMILARITY 
SIMILARITY 
SIMILARITY 



Belongs to the NOTCH family. 
Contains 34 EGF~like domains. 
Contains 3 Lin/Notch repeats , 
Contains 5 ANK repeats . 



This SWISS-PROT entry is copyright. It is produced through a collaboration 
between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 
the European Bioinf ormatics Institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed. Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to license@isb-sib . ch ) . 

EMBL; X74760; CAA52776.1; -. 
PIR; S45306; S45306. 
HSSP; P00740; 1EDM. 
MGD; MGI: 99460; Notch3 . 

GO; GO: 0005887; C:integral to plasma membrane; IC. 
GO; GO:0005515; F:protein binding; IPI. 
GO; GO: 0007219; P:N signaling pathway; IC. 
InterPro; IPR002110; ANK. 

IPR000152; Asx_hydroxyl_S . 
IPR000742; EGF_2 . 

M; EGF_Ca. 
EGF__II . 
EGF_like. 
Laminin EGF. 



IPR001E 
IPR001438; 
IPR006209; 
IPR002049; 
IPR008297; Notch. 
IPR000800; Notch dom. 



InterPro ; 
InterPro; 
InterPro ; 
InterPro ; 
InterPro; 
InterPro ; 
InterPro ; 
InterPro; 
Pfam; PF00023; ank; 6. 
Pfam; PF00008; EGF; 33. 
Pfam; PF00066; notch; 3. 
PIRSF; PIRSF002279; Notch; 1. 
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PROSITE; PS50297; ANK_REP_REGION; 1. 
PS50088; ANK^REPEAT; 4. 
PS00 010; ASX_HYDROXYL; 18. 
PS00022; EGF_1; 33. 
PS01186; EGF_2; 27. 
PS50026; EGF_3; 34. 
PS01187; EGF_CA; 16. 

: Transcription regulation; Activator; Differentiation; 
Developmental protein; Repeat; ANK repeat; EGF-like domain; 
Transmembrane; Glycoprotein; Signal; Phosphorylation. 
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Query Match 19.0%; Score 685.5; DB 1; Length 2318; 

Best Local Similarity 24.4%; Pred. No. 4.8e-37; 

Matches 216; Conservative 59; Mismatches 195; Indels 417; Gaps 48 

Qy 94 CCPGFYESGEMCVPHCADKCVHGRCIAPNT CQCEPGWGGTNCSSACDGDHW 144 

I M I | : I : I I II: I Mil: I 
Db 225 CLPGF--EGQNCEVN-VDDCPGHRCLNGGTCVDGVNTYNCQCPPEWTGQFCTEDVD 277 

Qy 145 GPHCTSRCQ CKNGALCNPITG — ACHCAAG FRGWRCE 17 9 

II I I I I : I : I I I III 

Db 278 ECQLQPNACHNGGTCFNLLGGHSCVCVNGWTGESCSQNIDDCATAVCFHGATCH 331 

Qy 180 DR CEQGTYGNDCH--QRC QCQNGATCD — HVTGE — CRCPPGYTGAFCED 223 

|| I I I I I i I III 1:1 I I I 1 I : I i I : 

Db 332 DRVASFYCACPMGKTGLLCHLDDACVSNPCHEDAICDTNPVSGRAICTCPPGFTGGACDQ 391 

Q y 224 LCPPGKHGPQCE 235 

I I I I : I I 

Db 392 DVDECSIGANPCEHLGRCVNTQGSFLCQCGRGYTGPRCETDVNECLSGPCRNQATCLDRI 451 



Qy 236 QRCPCQNGGVC-HHVTG-ECSCPSGWMGTVC 264 

I I I I I I I I I I I : I I I I : I : : I 

Db 452 GQFTCICMAGFTGTYCEVDIDECQSSPCVNGGVCKDRVNGFSCTCPSGFSGSMCQLDVDE 511 

Qy 2 65 GQP CPEGRFGKNCSQ ECQ CHNGGTCDA-ATGQCHC 298 

II I I I I I : : 1 I I : I I I : I I 

Db 512 CASTPCRNGAKCVDQPDGYECRCAEGFEGTLCERNVDDCSPDPCHHGRCVDGIASFSCAC 571 
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RESULT 9 


NTC2_ 


RAT 


ID 


NTC2 RAT STANDARD; PRT; 2471 AA. 


AC 


Q9QW3 0; 


DT 


28-FEB-2003 (Rel. 41, Created) 


DT 


28-FEB-2003 (Rel. 41, Last sequence update) 


DT 


28-FEB-2003 (Rel. 41, Last annotation update) 


DE 


Neurogenic locus notch homolog protein 2 precursor (Notch 2). 


GN 


NOTCH2 . 


OS 


Rattus norvegicus (Rat) . 


OC 


Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 


OC 


Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Rattus 


OX 


NCBI TaxID=10116; 


RN 


[1] 


RP 


SEQUENCE FROM N . A. 


RC 


TISSUE=Brain; 


RX 


MEDLINE-932 02 015; PubMed=12 957 4 5 ; 


RA 


Weinmaster G . , Roberts V.J., Lemke G. ; 


RT 


"Notch2 : a second mammalian Notch gene."; 


RL 


Development 116:931-941(1992). 



RN [ 2 ] 

RP TISSUE SPECIFICITY. 

RX MEDLINE=2 133 1789; PubMed-114 38 922 ; 

RA Irvin D.K., Zurcher S.D., Nguyen T., Weinmaster G., Kornblum H.I.; 

RT "Expression patterns of Notchl, Notch2, and Notch3 suggest multiple 

RT functional roles for the Notch-DSL signaling system during brain 

RT development."; 

RL J. Comp. Neurol. 436:167-181(2001). 

CC -!- FUNCTION: Functions as a receptor for membrane-bound ligands 
CC Jaggedl, Jagged2 and Deltal to regulate cell-fate determination. 

CC Upon ligand activation through the released notch intracellular 

CC domain (NICD) it forms a transcriptional activator complex with 

CC RBP-J kappa and activates genes of the enhancer of split locus. 

CC Affects the implementation of differentiation, proliferation and 

CC apoptotic programs. May play an essential role in pos timplantation 

CC development, probably in some aspect of cell specification and/or 

CC differentiation (By similarity) . 

CC -!- SUBUNIT: Heterodimer of a C-terminal fragment N(TM) and a N- 
CC terminal fragment N(EC) which are probably linked by disulfide 

CC bonds (By similarity) . 

CC -!- SUBCELLULAR LOCATION: Type I membrane protein. Following 

CC proteolytical processing NICD is translocated to the nucleus. 

CC -!- TISSUE SPECIFICITY: Highly expressed in the spleen and choroid 

CC plexus in the brain. Expressed in postnatal central nervous system 

CC (CNS) germinal zones and, in early postnatal life, within numerous 

CC cells throughout the CNS. It is more highly localized to 

CC ventricular germinal zones. Also found in the heart, liver and 

CC kidney. 

CC -!- DEVELOPMENTAL STAGE: Expressed in the brain during E14 and E17 . 

CC -!- PTM: Synthesized in the endoplasmic reticulum as an inactive form 

CC which is proteolytically cleaved by a furin-like convertase in the 

CC trans-Golgi network before it reaches the plasma membrane to yield 

CC an active, ligand-acces sible form. Cleavage results in a C- 

CC terminal fragment N(TM) and a N-terminal fragment N(EC). Following 

CC ligand binding, it is cleaved by TNF-alpha converting enzyme 

CC (TACE) to yield a membrane-associated intermediate fragment called 

CC notch extracellular truncation (NEXT) . This fragment is then 

CC cleaved by presenilin dependent gamma-secretase to release a 

CC notch-derived peptide containing the intracellular domain (NICD) 

CC from the membrane (By similarity) . 

CC -!- PTM: Phosphorylated (By similarity). 

CC -!- SIMILARITY: Belongs to the NOTCH family. 

CC -!- SIMILARITY: Contains 35 EGF-like domains. 

CC -!- SIMILARITY: Contains 2 Lin/Notch repeats. 

CC -!- SIMILARITY: Contains 6 ANK repeats. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; M93661; AAK13558.1; -. 

DR PIR; A49128; A49128. 

DR HSSP; P00743; 1CCF. 
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InterPro; 
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InterPro ; 
InterPro; 
InterPro; 
InterPro ; 
InterPro; 
InterPro; 



IPR002110; ANK. 

IPR000152; Asx_hydroxyl^S . 

IPR000742; EGF_2 . 

EGF_Ca . 
EGF_II . 
EGF_like. 
Laminin_EGF . 
Notch. 
Notch dom. 



IPR001881; 
IPR001438; 
IPR006209; 
IPR002049; 
IPR008297; 
IPR000800; 
Pfam; PF00023; ank; 6. 
Pfam; PF00008; EGF; 35. 
Pfam; PF00066; notch; 2. 
PIRSF; PIRSF002279; Notch; 1. 
PRINTS; PR00010; EGFBLOOD . 
PRINTS; PR00011; EGFLAMININ. 
PRINTS; PR01452; NOTCH. 
SMART; SM0024 8; ANK; 6. 
SMART; SM00179; EGF_CA; 24. 
SMART; SM00004; NL; 2. 

PROSITE; PS50297; ANKJREPJREGION; 1. 
PROSITE; PS50088; AN K_RE P EAT ; 4. 
PROSITE; PS000I0; ASX_HYDROXYL ; 22. 
PROSITE; PS00022; EGF^l ; 34. 
PROSITE; PS01186; EGF_2 ; 26. 
PROSITE; PS50026; EGFJ3; 35. 
PROSITE; PS01187; EGF_CA; 22. 

Receptor; Transcription regulation; Activator; Differentiations- 
Developmental protein; Repeat; ANK repeat; EGF-like domain; 
Transmembrane; Glycoprotein; Signal; Phosphorylation. 
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BY 


SIMILARITY. 


FT 


DISULFID 
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BY 


SIMILARITY. 


FT 


DISULFID 
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BY 


SIMILARITY. 


FT 


DISULFID 
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BY 


SIMILARITY. 


FT 


DISULFID 
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BY 


SIMILARITY. 



Query Match 
Best Local Similarity 
Matches 218; Conservative 



19.0%; Score 685.5; DB 1; Length 2471; 
24.8%; Pred. No. 5e-37; 

83; Mismatches 240; Indels 339; Gaps 55; 



Qy 

Db 

Qy 

Db 



3 ISLNSCLSFICL LLCH WIGTASPLNLE — DPNVCSHW ES 39 

| : : I I ! I : I : I I I : : i I : I I : I 

531 IDIDDCSSTPCLNGAKCIDHPNGYECQCATGFTGTLCDENIDNCDPDPCHHGQCQDGIDS 59 0 



-RHRVSY 



7 6 



4 0 YSVTVQESYPHPF — DQI — YYTS CTDILNWFKCT 

| : i III I : I | | : : | : : | : : : 

591 YTCICNPGYMGAICSDQIDECYSSPCLNDGRCIDLVNGYQCNCQPGTSGLNCEINFDDCA 650 



Qy 77 RTAYRHGE--KTMYRRKSQCCPGFYESGEMC VPHCADK 112 

II : I I I I I : I : I : I I 

Db 651 SNPCLHGACVDGINRYSCVCSPGF— TGQRCNIDIDECASNPCRKDATCINDVNGFRCMC 708 

Qy H3 CVHGRC IAPNTCQCEPGWGGTNCS SACDGDHWGPHCT 14 9 

1:111 : : I h III II I : I 

Db 709 PEGPHHPSCYSQVNECLSSPCIHGNCTGGLSGYKCLCDAGWVGINCE — VDKN ECL 762 

Qy 150 SRCQCKNGALCNPITGA— CHCAAGFRGWRCE— DRC EQGTYGNDCH-QRCQC 197 

I | : M i I : | | | | : | : I : II I I I : I II 

Db 7 63 SN-PCQNGGTCNNLVNGYRCTCKKGFKGYNCQVNIDECASNPCLNQGTCLDDVSGYTCHC 821 

Qy 198 Q NGATCDHVTGECRCPPGYTGAFCED LCPPGKHGPQCE QRC P 239 

III I I II:: I I I I I : I I I 

Db 822 ML P YT G KN CQT VLAP C S PN P C EN AAVCK EAP N FE S FT C L CAP GWQ GQ RCT VD VD E CVS K P 881 

Qy 240 CQNGGVCHHVTGE--CSCPSGWMGTVCGQPCPEGRFGKNCSQEC QCHNGGTC--DAA 292 

I I I : I I : I I I I I : I I : : I I I I I : I 

Db 882 CMNNGICHNTQGSYMCECPPGFSGMDCEEDI NDCLANPCQNGGSCVDKVN 931 

Qy 2 93 TGQCHCSPGYTGERCQDE CPVGTYGVLC AETC 32 4 

I I I I I : I :: I I : I I I : I I I = I 

Db 932 TFSCLCLPGFVGDKCQTDMNECLSEPCKNGGTCSDYVNSYTCTCPAGFHGVHCENNIDEC 991 

Qy 325 QCVNGGKCYHVSG ACLCEAGFAGERC EARLCPEGLYGI KC 364 

I I I I I I I : I II I I I I : I : I I : I 

Db 992 TESSCFNGGTC--VDGINSFSCLCPVGFTGPFCLHDINECSSNPCLNSGTCVDGLGTYRC 104 9 

Qy 365 DKRC PCHLENTHSCHPMSGECACKPGWSGLYCNE 398 

II II : I : I i I I I I I I : 

Db 1050 TCPLGYTGKNCQTLVNLCSPSPCKNKGTCAQEKARPRCLCPPGWDGAYCDVLNVSCKAAA 1109 



Qy 



399 TCSPGFYGEACQQ ICS CQNGADCDSVTG 426 



Db 


1110 


LQKGVPVEHLCQHSGICINAGNTHHCQCPLGYTGSYCEEQLDECASNPCQHGATCSDFIG 


1169 


Ov 


427 


--KCTCAPGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVCSPVDGSCTCKAGWHGVDCSI 
: 1 I 1 1 : : 1 : : 1 1 : : 1 : 1 1 : i 

GYRCECVPGYQGVNCE YEVDECQNQPCQNGGTCI DLVNHFKCS 


484 


Db 


1170 


1212 


Qv 


485 


RCPSGTWGFGC — NL~TC QCLNGGAC-NTLDG-TCTCAPGWRGEKCE L 

1 1 I 1 1 1 1 : 1 1 I 1 I i i : : 1 : 1 1 1 1 : 1 1 : 1 1 
-CPPGTRGLLCEENIDDCAGAPHCLNGGQCVDRIGGYSCRCLPGFAGERCEGDINECLSN 


527 


Db 


1213 


1271 


Qy 


528 


PC-QDGTYGLNCAE RCDCSHA DGCH PTTG 

1 1 : 1 : 1 : 1 : Mil II i 

PCSSEGS-- -LDCIQLKNNYQCVCRSAFTGRHCETFLDVCPQKPCLNGGTCAVASNVPDGF 


555 


Db 


1272 


1329 


Qy 


556 


HCRCLPGWSGVHCDSVCAEGRW GPNCSLP 584 




Db 


1330 


11111:11 |||:: 1 1 : 1 1 
ICRCPPGFSGARCQSSCGQVKCRRGEQCVHTASGPHCFCP 13 69 





RESULT 10 
NTC3_RAT 

ID NTC3_RAT STANDARD; PRT; 2319 AA. 

AC Q9R172; 

DT 28-FEB-2003 (Rel. 41, Created) 

DT 28-FEB-2003 (Rel. 41, Last sequence update) 

DT 28-FEB-2003 (Rel. 41, Last annotation update) 

DE Neurogenic locus notch homolog protein 3 precursor (Notch 3) . 

GN NOTCH3 . 

OS Rattus norvegicus (Rat) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos to mi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Rattus. 

OX NCBI_TaxID=10116; 

RN [1] 

RP SEQUENCE FROM N.A. 

RA Haritunians T . , Boulter J., Weinmaster G. , Schanen N.C.; 

RT "Rattus norvegicus mRNA for Notch 3."; 

RL Submitted (SEP-2000) to the EMBL/GenBank/DDB J databases. 

RN [2] 

RP FUNCTION. 

RX MEDLINE=21094508; PubMed=11182080; 

RA Tanigaki K. , Nogaki F . , Takahashi J., Tashiro K. , Kurooka H., 

RA Hon jo T. ; 

RT "Notchl and Notch3 instructively restrict bFGF-responsi ve multipotent 

RT neural progenitor cells to an astroglial fate."; 

RL Neuron 2 9:45-55(2001). 

RN [3] 

RP TISSUE SPECIFICITY. 

RX MEDLINE=21331789; PubMed-11438 922 ; 

RA Irvin D.K., Zurcher S.D., Nguyen T . , Weinmaster G. , Kornblum H.I.; 

RT "Expression patterns of Notchl, Notch2, and Notch3 suggest multiple 

RT functional roles for the Notch-DSL signaling system during brain 

RT development . " ; 

RL J. Comp. Neurol. 436:167-181(2001). 

CC FUNCTION: Functions as a receptor for membrane-bound ligands 

CC Jaggedl, Jagged2 and Deltal to regulate cell-fate determination. 

CC Upon ligand activation through the released notch intracellular 



CC domain (NICD) it forms a transcriptional activator complex with 

CC RBP-J kappa and activates genes of the enhancer of split locus. 

CC Affects the implementation of differentiation, proliferation and 

CC apoptotic programs (By similarity) . Acts instructively to control 

CC the cell fate determination of CNS multipotent progenitor cells, 

CC resulting in astroglial induction and neuron/ oligodendrocyte 

CC suppression. 

CC -!- SUBUNIT: Heterodimer of a C-terminal fragment N(TM) and a N- 
CC terminal fragment N(EC) which are probably linked by disulfide 

CC bonds (By similarity) . 

CC SUBCELLULAR LOCATION: Type I membrane protein. Following 

CC proteolytical processing NICD is translocated to the nucleus. 

CC -!- TISSUE SPECIFICITY: Expressed in postnatal central nervous system 

CC (CNS) germinal zones and, in early postnatal life, within 

CC numerous cells throughout the CNS. It is more highly localized 

CC to ventricular germinal zones . 

CC -!- PTM: Synthesized in the endoplasmic reticulum as an inactive form 

CC which is proteolytically cleaved by a furin-like convertase in the 

CC trans-Golgi network before it reaches the plasma membrane to yield 

CC an active, ligand-accessible form. Cleavage results in a C- 

CC terminal fragment N(TM) and a N-terminal fragment N(EC). Following 

CC ligand binding, it is cleaved by TNF-alpha converting enzyme 

CC (TACE) to yield a membrane-associated intermediate fragment called 

CC notch extracellular truncation (NEXT) . This fragment is then 

CC cleaved by presenilin dependent gamma-secretase to release a 

CC notch-derived peptide containing the intracellular domain (NICD) 

CC from the membrane (By similarity) . 

CC -!- PTM: Phosphorylated (By similarity). 

CC -!- SIMILARITY: Belongs to the NOTCH family. 

CC -!- SIMILARITY: Contains 34 EGF-like domains. 

CC -!- SIMILARITY: Contains 3 Lin/Notch repeats. 

CC -!- SIMILARITY: Contains 5 ANK repeats. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; AF164486; AAD46653.2; -. 

DR HSSP; P00740; 1EDM. 

DR InterPro; IPR002110; ANK. 

DR InterPro; IPR000152; Asx_hydroxyl_S . 

DR InterPro; IPR000742; EGF_2 . 

DR InterPro; IPR001881; EGF^Ca . 

DR InterPro; IPR001438; EGF_II. 

DR InterPro; IPR006209; EGF_like. 

DR InterPro; IPR002049; Laminin_EGF. 

DR InterPro; IPR008297; Notch. 

DR InterPro; IPR000800; Notch_dom. 

DR Pfarn; PF00023; ank; 6. 

DR Pfarn; PF00008; EGF; 33. 

DR Pfarn; PF00066; notch; 3. 

DR PIRSF; PIRSF002279; Notch; 1. 

DR PRINTS; PR00010; EGFBLOOD. 



DR 


PRINTS; 


PR00011; 


EGFLAMININ. 








DR 


PRINTS; 


PR01452; 


NOTCH. 










DR 


SMART; SM00248; ANK; 6. 










DR 


SMART; SM0 017 9; EGF CA; 


20. 








DR 


SMART; SM00004; NL; 3. 










DR 


PROSITE; 


PS50297; 


ANK_REP_REGION; 1 . 






DR 


PROSITE; 


PS50088; 


ANK_REPEAT; 


4. 






DR 


PROSITE; 


PS00010; 


ASX HYDROXYL; 18 . 
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PROSITE; 


PS00022; 


EGF_1 ; 


33. 
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PROSITE; 


PS01186; 


EGF_2 ; 


26. 








DR 


PROSITE; 


PS50026; 


EGF 3; 


34. 
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PROSITE; 


PS01187; 


EGF CA; 16. 
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KW 


Transmembrane; Glycoprotein; 
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Query Match 19.0%; Score 682.5; DB 1; Length 2319; 

Best Local Similarity 24.7%; Pred. No. 7.5e-37; 

Matches 219; Conservative 56; Mismatches 196; Indels 415; Gaps 49 



Qy 



94 CCPGFYESGEMCVPHCADKCVHGRCIAPNT 



CQCEPGWGGTNCSSACDGDHW 144 



I I I I I : I : I I ||: I I I I I I I I : I 

Db 226 CLPGF--EGQNCEVN-VDDCPGHRCLNGGTCVDGVNTYNCQCPPEWTGQFCTEDVD 278 

Qy 145 GPHCTSRCQ CKNGALCNPITG--ACHCAAG FRGWRCE 179 

II I I I I : I : I I I 1 I I 

Db 279 ECQLQPNACHNGGTCFNLLGGHSCVCVNGWTGESCSQNIDDCATAVCFHGATCH 332 

Qy 180 DR CEQGTYGNDCH— QRC QCQNGATCD — HVTGE — CRCPPGYTGAFCED 223 

II I I I I I I I I I I hi I II I I : I I I : 

Db 333 DRVASFYCACPMGKTGLLCHLDDACVSNPCHEDAICDTNPVSGRAICTCPPGFTGGACDQ 3 92 

Qy 224 LCPPGKHGPQCE 235 

I I I I : I I 

Db 393 DVDECSIGANPCEHLGRCVNTQGSFLCQCGRGYTGPRCETDVNECLSGPCRNQATCLDRI 452 

Qy 236 QRCPCQNGGVC-HHVTG-ECSCPSGWMGTVC 2 64 

I II I I I I I II I : I I I I : I : I 
Db 453 GQFTCICMAGFTGTFCEVDIDECQSSPCVNGGVCKDRVNGFSCTCPSGFSGSTCQLDVDE 512 

Qy 265 GQP CPEGRFGKNCSQ ECQ CHNGGTCDA-ATGQCHC 298 

II I I I I I : : I I I : I I I : I I 

Db 513 CASTPCRNGAKCVDQPDGYECRCAEGFEGTLCERNVDDCSPDPCHHGRCVDGIASFSCAC 572 

Qy 299 SPGYTGERCQDE CPVGTYGVLC AETCQ 325 

: I I I I I I I : : I I I I I I I : I 

Db 573 APGYTGIRCESQVDECRSQPCRYGGKCLDLVDKYLCRCPPGTTGVNCEVNIDDCASNPCT 632 

Qy 326 CVNGGKCYHVSGACLCEAGFAGERCEARL CPEGLYGIKCDKRCP 369 

I : I I I : I : I i I I : I : I I I II 

Db 633 FGVCRDGINRYD CVCQPGFTGPLCNVEINECASSPCGEGGSCVDGENGFHC— LCP 68 6 

Qy 37 0 CHLENTHS-CHPMSG — ECACKPGWSGLYCNE 398 

I : I II I : I I I I I I I I : : 
Db 687 PGSLPPLCLPANHPCAHKPCSHGVCHDAPGGFQCVCDPGWSGPRCSQS LAP DACE SQPCQ 746 

Qy 399 TCSPGFYGEACQQI--CS CQNGADCDSVTGK CTCAPGFKG 436 

I I : I I I I i : : I : I : : I I : I : I : I I I : : I 
Db 747 AGGTCTSDGIGFHCTCAPGFQGHQCEVLSPCTPSLCEHGGHCESDPDQLTVCSCPPGWQG 8 06 

Qy 437 IDC STPC-PLGT-YGINCSSRCGCKNDAV CSP 466 

I :: I I I I 1 : I I I I II 

Db 8 07 PRCQQDVDECAGASPCGPHGTCTNLPGSFRCICHGGYTGPFCDQDIDDCDPNPCLNGGSC 8 66 

Qy 467 VDG SCTCKAGWHGVDC S IRCPSGTWGFGCN — 496 

II I I : I : I : I I : I I I I I I 

Db 867 QDGVGSFSCSCLSGFAGPRCARDVDECLSSPCGPGTCTDHVASFTCTCPPGYGGFHCETD 926 

Qy 497 -LTC QCLNGGACNTLDG TCTCAPGWRGEKC 525 

II 11111:11 : I I I I : I I 

Db 927 LLDCSPSSCFNGGTC--VDGVNSFSCLCRPGYTGTHCQYKVDPCFSRPCLHGGICNPTHS 984 

Qy 52 6 --ELPCQDGTYGLNCAERCD CSHADGCHPTTGHCRCLPGWSGVHCD 5 69 

I I : : I I I I I : I I : I I I II I I I 

Db 985 GFECTCREGFTGNQCQNPVDWCSQAPCQNGGRCVQTGAYCICPPEWSGPLCDIPSLPCTE 1044 

Qy 570 SVCAEGRWGPNCSL PC 585 

111:111 



Db 



1045 AAAHMGVRLEQLCQAGGQCIDKDHSHYCVCPEGRMGSHCEQEVDPC 1090 



RESULT 11 
TENX_HUMAN 

ID TENX_HUMAN STANDARD; PRT; 42 8 9 AA. 

AC P22105; P78530; P78531; Q08424; Q9UMG7; 

DT 01-AUG-1991 ( Rel . 19, Created) 

DT 16-OCT-2001 (Rel. 40, Last sequence update) 

DT 10-OCT-2003 (Rel. 42, Last annotation update) 

DE Tenascin X precursor (TN-X) (Hexabrachion-like) . 

GN TNXB OR TNX OR XB OR HXBL. 

OS Homo sapiens (Human) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 

OC Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

OX NCBI_TaxID=9 60 6; 

RN [1] 

RP SEQUENCE FROM N.A. 

RA Rowen L . , Dankers C, Baskin D., Faust J., Loretz C, Ahearn M.E., 

RA Banta A., Schwartzell S., Smith T.M., Spies T., Hood L . ; 

RT "Sequence determination of 300 kilobases of the human class III MHC 

RT locus."; 

RL Submitted (SEP-1997) to the EMBL/ GenBank/DDB J databases. 

RN [2] 

RP SEQUENCE OF 1-747 AND 1687-1944 FROM N.A. 

RC TISSUE=Leukocyte; 

RX MEDLINE=93300909; PubMed=7 68 6164 ; 

RA Bristow J., Tee M.K., Gitelman S.E., Mellon S.H., Miller W.L.; 

RT "Tenascin-X: a novel extracellular matrix protein encoded by the human 

RT XB gene overlapping P450c21B."; 

RL J. Cell Biol. 122:265-278(1993). 

RN [3] 

RP SEQUENCE FROM N.A. (ISOFORM XB-SHORT) . 

RC TISSUE=Adrenal gland; 

RX MEDLINE=96015044; PubMed=8530023 ; 

RA Tee M.K., Thomson A. A., Bristow J., Miller W.L.; 

RT "Sequences promoting the transcription of the human XA gene 

RT overlapping P450c2lA correctly predict the presence of a novel, 

RT adrenal-specific, truncated form of tenas cin-X . " ; 

RL Genomics 2 8:171-178(1995). 

RN [4] 

RP SEQUENCE OF 1-23 FROM N.A. 

RC TISSUE=Fetal adrenal gland; 

RX MEDLINE=97 0817 60; PubMed=8 9230 03 ; 

RA Speek M. , Barry F., Miller W.L.; 

RT "Alternate promoters and alternate splicing of human tenascin-X, a 

RT gene with 5' and 3' ends buried in other genes."; 

RL Hum. Mol. Genet. 5:174 9-17 58(199 6). 

RN [5] 

RP SEQUENCE OF 3470-4289 FROM N.A. 

RX MEDLINE=89367293; PubMed=2 475872 ; 

RA Morel Y . , Bristow J., Gitelman S.E., Miller W.L.; 

RT "Transcript encoded on the opposite strand of the human steroid 21- 

RT hydroxylase/ complement component C4 gene locus."; 

RL Proc. Natl. Acad. Sci. U.S.A. 8 6:6582-658 6(198 9). 

RN [6] 

RP DISEASE. 



RX MEDLINE=21468843; PubMed=l 1 64 22 33 ; 

RA Schalkwijk J., Zweers M.C., Steijlen P.M., Dean W.B., Taylor G . , 

RA van Vlijmen I.M., van Haren B., Miller W.L., Bristow J. ; 

RT "A recessive form of the Ehlers-Danlos syndrome caused by tenascin-X 

RT deficiency. " ; 

RL New Engl. J. Med. 345:1167-1175(2 001). 

CC -!- FUNCTION: Appears to mediate interactions between cells and the 
CC extracellular matrix. Substrate-adhesion molecule that appears to 

CC inhibit cell migration. May play a role in supporting the growth 

CC of epithelial tumors. 

CC -!- SUBCELLULAR LOCATION: Secreted; extracellular matrix. 

CC -!- ALTERNATIVE PRODUCTS: 

CC Event=Alternative splicing; Named isoforms=2; 

CC Comment=Additional isoforms seem to exist; 

CC Name=XB; 

CC IsoId=P2 2 105-1 ; Sequence^Di splayed; 

CC Name^XB-short ; 

CC IsoId=P22105-2; Sequence=VSP_0 014 18 ; 

CC -!- TISSUE SPECIFICITY: Highly expressed in fetal adrenal, in fetal 

CC testis, fetal smooth, striated and cardiac muscle. Isoforrn XB- 

CC short is only expressed in the adrenal gland. 

CC -!- DISEASE: Association with congenital adrenal hyperplasia. 

CC DISEASE: Defects in TNXB are the cause of Ehlers -Danlos-like 

CC syndrome [MIM: 6064 08 ] . This clinically distinct form of Ehlers- 

CC Danlos syndrome is characterized by hyperextensible skin, 

CC hypermobile joints, and tissue fragility, but it lacks atrophic 

CC scars and delayed wound healing. Inheritance is autosomal 

CC recessive. 

CC -!- SIMILARITY: Contains 19 EGF-like domains. 

CC -!- SIMILARITY: Contains 32 fibronectin type III domains. 

CC SIMILARITY: Contains 1 fibrinogen C- terminal domain. 

CC -!- CAUTION: There are two genes for TN-X: TNXA and TNXB. TNXA is a 

CC partial gene which can sometimes recombine with TNXB. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb~sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; U89337; AAB47488.1; 

DR EMBL; AF019413; AAB67981.1; -. 

DR EMBL; X71923; CAA50739.1; -. 

DR EMBL; Y13782; CAA74109.1; -. 

DR EMBL; Y13783; CAA74110.1; -. 

DR EMBL; U24488; AAB412 8 7.1; -. 

DR EMBL; U52696; AAC50889.1; -. 

DR EMBL; M25813; AAA35884.1; -. 

DR PIR; A40701; A40701. 

DR HSSP; P02671; 1FZD. 

DR Genew; HGNC: 11976; TNXB. 

DR MIM; 600985; -. 

DR MIM; 606408; -. 

DR GO; GO:0005578; C : extracellular matrix ; NAS . 

DR GO; GO: 0007160; P : cell-matrix adhesion; NAS. 
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Query Match 18.9%; Score 682; DB 1; Length 4289; 

Best Local Similarity 28.4%; Pred. No. 1.3e-36; 

Matches 191; Conservative 41; Mismatches 178; Inciels 262; Gaps 38; 



Qy 94 CCPGFYESG EMC VP H CAD K C VH G RCIAPNTCQCEPGWGGTNCSSACDG 141 

III : : I : I : I I II I I I I I I I I I I I I 

Db 12 5 C C P AS AQAGT GQT DVRT LC SLHGVFDLSRC TCSCEPGWGGPTCSDPTDA 173 



Qy 142 D HWGPHC TSRCQCKNGALCNPITG 165 

: : I I I : I : I : I 

Db 174 EIPPSSPPSASGSCPDDCNDQGRCVRGRCVCFPGYTGPSCGWPSCPGDCQGRGRC— VQG 231 

Qy 166 ACHCAAGFRGWRCEDR-CEQGTYGNDCHQRCQCQNGATCDHVTGECRCPPGYTGAFC-ED 223 

I I I I I I I I I : I I I I : I : I I I I I I I I I 

Db 232 VCVCRAGFSGPDCSQRSCPRG CSQRGRCEG GRCVCDPGYTGDDCGMR 278 

Qy 224 LCPPGKHGPQCEQRCPCQNGGVCHHVTGECSCPSGWMGTVCG-QPCPEGRFGKNCSQECQ 282 

Ml I I I I : I I I I I : I II : I I I III: 

Db 27 9 SCPRG CSQRGRCEN GRCVCNPGYTGEDCGVRSCPRG CSQRGR 32 0 



Qy 283 CHNGGTCDAATGQCHCSPGYTGERC-QDECPVGTYGVLCAETCQCVNGGKCYHVSGACLC 341 

i : I : I I I II I I I I II I I I : I I I I : I 



Db 321 CKD GRCVCDPGYTGEDCGTRSCP WDCGEGGRC — VDGRCVC 359 

Qy 342 E AG FAGE RC E ARL CPE GLYGIKCDKR-CPCHLENTHSCHPMSG 383 

I : I I I I I I I I I I I I I I 

Db 360 WPGYTGEDCSTRTCPRDCRGRGRCEDGECICDTGYSGDDCGVRSCPGDCNQRGRCE — DG 417 

Qy 384 ECACKPGWS GLYCNE TCSPGFYGEAC-QQIC — SCQNGADCD 422 

I I I I I I I : I : I | | : | | : | : 

Db 418 RCVCWPGYTGTDCGSRACPRDCRGRGRCENGVCVCNAGYSGEDCGVRSCPGDCRGRGRCE 4 77 

Qy 423 SVTGKCTCAPGFKGIDCST PCPLGTYGINCS S-RC— GCKND 461 

I I : I I I I : I I I I 111:1111 I : 

Db 478 S--GRCMCWPGYTGRDCGTRACPGDCRGRGRCVDGRCVCNPGFTGEDCGSRRCPGDCRGH 535 

Qy 4 62 AVCS PVDGS CTCKAGWHGVDCS I R- CPSGTWGFGCNLTCQCLNG 5 04 

: I I I I I I I : I I I I I II I I I I t I : I 

Db 536 GLCE--DGVCVCDAGYSGEDCSTRSCPGGCRGRG QCLDGRCVCEDGYSGEDCGVR 588 

Qy 505 GACNTLDGTCTCAPGWRGEKCELP CQDGTYGLN 537 

1111111:11: III 
Db 589 QCPNDCSQHGVCQ— DGVCICWEGYVSEDCSIRTCPSNCHGRGRCEEGRCLCDPGYTGPT 646 

Qy 538 CAER CDCSHADGCHPTTGHCRCLPGWSGVHC DSV 571 

I I I I I I I I I I : I I I 

Db 647 CATRMCPADCRGRGRC— VQGVCLCHVGYGGEDCGQEEPPASACPGGCGPRELCRAGQCV 7 04 

Qy 572 CAEGRWGPNCSL 583 

|||||:|:: 

Db 7 05 CVEGFRGPDCAI 716 



RESULT 12 
NTC2_MOUSE 

ID NTC2_MOUSE STANDARD; PRT; 2470 AA. 

AC 035516; Q06008; Q60941; 

DT 28-FEB-2003 (Rel. 41, Created) 

DT 28-FEB-2003 (Rel. 41, Last sequence update) 

DT 28-FEB-2003 (Rel. 41, Last annotation update) 

DE Neurogenic locus notch homolog protein 2 precursor (Notch 2) (Motch 

DE B) . 

GN NOTCH2 . 

OS Mus musculus (Mouse) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

OX NCBI__TaxID=10090; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=C57BL/6; TI SSUE=Thymus ; 

RA Hamada Y., Higuchi M. , Tsujimoto Y . ; 

RT "Complete amino acid sequence and mutliform transcripts encoded by a 

RT single copy of mouse Notch2 gene . " ; 

RL Submitted (JUL-1994) to the EMBL/ GenBank/DDBJ databases. 

RN [2] 

RP SEQUENCE OF 316-1518 FROM N.A. 

RC STRAIN-C57BL/6 X CBA; TI SSUE=Embr yo ; 

RX MEDLINE=9317 8 563; PubMed=84 4 0332 ; 

RA Lardelli M. , Lendahl U.; 



RT "Motch A and Motch B-two mouse Notch homologues coexpressed in a 

RT wide variety of tissues."; 

RL Exp. Cell Res. 2 04:364-372(1993). 

RN [3] 

RP SEQUENCE OF 1765-2153 FROM N.A. 

RX MEDLINE=97 075110; PubMed=8 9 17 53 6 ; 

RA Milner L.A., Bigas A., Kopan R . , Brashem-Stein C, Bernstein I.D., 

RA Martin D. I . ; 

RT "Inhibition of granulocytic differentiation by mNotchl."; 

RL Proc. Natl. Acad. Sci. U.S.A. 93:13014-13019(1996). 

RN [ 4 ] 

RP FUNCTION. 

RX MEDLINE=993967 06; PubMed=l 03 93 12 0 ; 

RA Hamada Y. , Kadokawa Y. , Okabe M. , Ikawa M., Coleman J.R., 

RA Tsujimoto Y. ; 

RT "Mutation in ankyrin repeats of the mouse Notch2 gene induces early 

RT embryonic lethality."; 

RL Development 126:3415-3424(1999). 

RN [5] 

RP DEVELOPMENTAL STAGE, AND ALTERNATIVE SPLICING. 

RX MEDLINE=95333893; PubMed=7 60 9 614 ; 

RA Higuchi M. , Kiyama H., Hayakawa T., Hamada Y., Tsujimoto Y. ; 

RT "Differential expression of Notchl and Notch2 in developing and adult 

RT mouse brain."; 

RL Brain Res. Mol. Brain Res. 2 9:263-272(1995). 

RN [6] 

RP POST-TRANS LATIONAL PROCESSING, AND MUTAGENESIS OF MET-1699. 

RX MEDLINE=2152 3 956; PubMed=l 15 1 87 1 8 ; 

RA Saxena M.T., Schroeter E.H., Mumm J.S., Kopan R. ; 

RT "Murine notch homologs (Nl-4) undergo presenilin-dependent 

RT proteolysis."; 

RL J. Biol. Chem. 276:40268-40273(2 001). 

RN [7] 

RP POST-TRANS LATIONAL PROCESSING, AND MUTAGENESIS OF MET-1699. 

RX MEDLINE-2 137 4 37 6; PubMed= 11459941; 

RA Mizutani T., Taniguchi Y., Aoki T., Hashimoto N., Honjo T . ; 

RT "Conservation of the biochemical mechanisms of signal transduction 

RT among mammalian Notch family members."; 

RL Proc. Natl. Acad. Sci. U.S.A. 98:9026-9031(2001). 

CC -!- FUNCTION: Functions as a receptor for membrane-bound ligands 
CC Jaggedl, Jagged2 and Deltal to regulate cell-fate determination. 

CC Upon ligand activation through the released notch intracellular 

CC domain (NICD) it forms a transcriptional activator complex with 

CC RBP- J kappa and activates genes of the enhancer of split locus . 

CC Affects the implementation of differentiation, proliferation and 

CC apoptotic programs (By similarity) . May play an essential role in 

CC postimplantation development, probably in some aspect of cell 

CC specification and/or differentiation. 

CC -!- SUBUNIT: Heterodimer of a C-terminal fragment N(TM) and a N- 
CC terminal fragment N(EC) which are probably linked by disulfide 

CC bonds . 

CC -!- SUBCELLULAR LOCATION: Type I membrane protein. Following 

CC proteolytical processing NICD is translocated to the nucleus. 

CC -!- ALTERNATIVE PRODUCTS: 

CC Event=Alternative splicing; Named isof orms=2 ; 

CC Name=l; 

CC Isold=035516-1 ; Sequence=Displayed; 



CC Name=2 ; 

CC IsoId=03 5516-2; Sequence=VSP_0014 05 ; 

CC Note=No experimental confirmation available; 

CC -!- TISSUE SPECIFICITY: Expressed in the brain, liver, kidney, 

CC neuroepithelia, somites, optic vesicles and branchial arches, but 

CC not heart. 

CC -!- DEVELOPMENTAL STAGE: Expressed in the embryonic ventricular zone, 
CC the postnatal ependymal cells, and the choroid plexus throughout 

CC embryonic and postnatal development. 

CC -!- PTM: Synthesized in the endoplasmic reticulum as an inactive form 

CC which is proteolytically cleaved by a furin-like convertase in the 

CC trans-Golgi network before it reaches the plasma membrane to yield 

CC an active, ligand-accessible form. Cleavage results in a C- 

CC terminal fragment N(TM) and a N-terminal fragment N(EC). Following 

CC ligand binding, it is cleaved by TNF-alpha converting enzyme 

CC (TACE) to yield a membrane-associated intermediate fragment called 

CC notch extracellular truncation (NEXT) . This fragment is then 

CC cleaved by presenilin dependent gamma-secretase to release a 

CC notch-derived peptide containing the intracellular domain (NICD) 

CC from the membrane. 

CC -!- PTM: Phosphorylated. 

CC -!- SIMILARITY: Belongs to the NOTCH family. 

CC -!- SIMILARITY: Contains 35 EGF-like domains. 

CC -!- SIMILARITY: Contains 2 Lin/Notch repeats. 

CC -!- SIMILARITY: Contains 6 ANK repeats. 

CC - — - - - -- - 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to licens e@isb-sib . ch) . 

CC 

DR EMBL; D32210; BAA22094.1; -. 

DR EMBL; X68279; CAA48340.1; -. 

DR EMBL; U31881; AAC52924.1; -. 

DR PIR; A49175; A49175. 

DR HSSP; P16109; 1FSB. 

DR MGD; MGI: 97364; Notch2 . 

DR GO; GO: 0005887; C:integral to plasma membrane; IC. 

DR GO; GO:0005515; F:protein binding; IPI. 

DR GO; GO: 0002011; P : morphogenes i s of an epithelial sheet; IMP. 

DR GO; GO: 0007219; P:N signaling pathway; IC. 

DR InterPro; IPR002110; ANK. 

DR InterPro; IPR000152; Asx_hydroxyl_S . 

DR InterPro; IPR000742; EGF_2 . 

DR InterPro; IPR001881; EGF_Ca . 

DR InterPro; IPR001438; EGF_II. 

DR InterPro; IPR006209; EGF_like. 

DR InterPro; IPR002049; Laminin_EGF. 

DR InterPro; IPR008297; Notch. 

DR InterPro; IPR000800; Notch_dom. 

DR Pfarn; PF00023; ank; 6. 

DR Pfarn; PF00008; EGF; 34. 

DR Pfarn; PF00066; notch; 2. 

DR PIRSF; PIRSF002279; Notch; 1. 



DR PRINTS; PR00010; EGFBLOOD . 

DR PRINTS; PR00011; EGFLAMININ . 

DR PRINTS; PR01452; NOTCH. 

DR SMART; SM00248; ANK; 6. 

DR SMART; SM0 017 9; EGF CA; 23. 

DR SMART; SM0 0004 ; NL;"~3. 

DR PROSITE; 

DR PROSITE; 

DR PROSITE; 

DR PROSITE; 

DR PROSITE; 

DR PROSITE; 

DR PROSITE; 

KW Receptor; Transcription regulation; Activator; Differentiation; 

KW Developmental protein; Repeat; ANK repeat; EGF-like domain; 

KW Transmembrane; Glycoprotein; Signal; Phosphorylation; 

KW Alternative splicing. 
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FT 


DOMAIN 


105 


143 


EGF-LIKE 


3. 






FT 


DOMAIN 


144 


180 


EGF-LIKE 


4 . 






FT 


DOMAIN 


182 


219 


EGF-LIKE 


5, 


CALCIUM-BINDING 


(POTENTIAL) . 


FT 


DOMAIN 


221 


256 


EGF-LIKE 


6 ( 


INCOMPLETE) . 




FT 


DOMAIN 


258 


294 


EGF-LIKE 


1, 


CALCIUM-BINDING 


(POTENTIAL) . 


FT 


DOMAIN 


296 


334 


EGF-LIKE 


8, 


CALCIUM-BINDING 


(POTENTIAL) . 


FT 


DOMAIN 


336 


372 


EGF-LIKE 


9, 


CALCIUM-BINDING 


(POTENTIAL) . 


FT 


DOMAIN 


373 


411 


EGF-LIKE 


10. 






FT 


DOMAIN 


413 


452 


EGF-LIKE 


11, 


CALCIUM-BINDING 


(POTENTIAL) 


FT 


DOMAIN 


454 


490 


EGF-LIKE 


12, 


CALCIUM-BINDING 


(POTENTIAL) 


FT 


DOMAIN 


492 


52 8 


EGF-LIKE 


13, 


CALCIUM-BINDING 


(POTENTIAL) 


FT 


DOMAIN 


530 


566 


EGF-LIKE 


14, 


CALCIUM-BINDING 


(POTENTIAL) 


FT 


DOMAIN 


568 


603 


EGF-LIKE 


15, 


CALCIUM-BINDING 


(POTENTIAL) 


FT 


DOMAIN 


605 


641 


EGF-LIKE 


16, 


CALCIUM-BINDING 


(POTENTIAL) 


FT 


DOMAIN 


643 


678 


EGF-LIKE 


17, 


CALCIUM-BINDING 


(POTENTIAL) 


FT 


DOMAIN 


680 


716 


EGF-LIKE 


18, 


CALCIUM-BINDING 


(POTENTIAL) 


FT 


DOMAIN 


718 


753 


EGF-LIKE 


19. 






FT 


DOMAIN 


755 


791 


EGF-LIKE 


20, 


CALCIUM-BINDING 


(POTENTIAL) 


FT 


DOMAIN 


793 


829 


EGF-LIKE 


21, 


CALCIUM-BINDING 


(POTENTIAL) 


FT 


DOMAIN 


831 


869 


EGF-LIKE 


22 . 






FT 


DOMAIN 


871 


907 


EGF-LIKE 


23, 


CALCIUM-BINDING 


(POTENTIAL) 


FT 


DOMAIN 


909 


945 


EGF-LIKE 


24, 


CALCIUM-BINDING 


(POTENTIAL) 


FT 


DOMAIN 


947 


983 


EGF-LIKE 


25, 


CALCIUM-BINDING 


(POTENTIAL) 


FT 


DOMAIN 


985 


1021 


EGF-LIKE 


26, 


CALCIUM-BINDING 


(POTENTIAL) 


FT 


DOMAIN 


1023 


1059 


EGF-LIKE 


27, 


CALCIUM-BINDING 


(POTENTIAL) 


FT 


DOMAIN 


1061 


1097 


EGF-LIKE 


28. 






FT 


DOMAIN 


1099 


1145 


EGF-LIKE 


29. 






FT 


DOMAIN 


1147 


1183 


EGF-LIKE 


30, 


CALCIUM-BINDING 


(POTENTIAL) 


FT 


DOMAIN 


1185 


1221 


EGF-LIKE 


31, 


CALCIUM-BINDING 


(POTENTIAL) 


FT 


DOMAIN 


1223 


1260 


EGF-LIKE 


32, 


CALCIUM-BINDING 


(POTENTIAL) 


FT 


DOMAIN 


1262 


1300 


EGF-LIKE 


33. 







FT 


DOMAIN 


1302 


1345 


FT 


DOMAIN 


1372 


1410 


FT 


REPEAT 


1418 


1454 


FT 


REPEAT 


1501 


1533 


FT 


REPEAT 


1825 


1869 



EGF-LIKE 34. 
EGF-LIKE 35. 
LIN/NOTCH 1. 
LIN/NOTCH 2. 
ANK 1. 

Query Match 18.8%; Score 677; DB 1; Length 2470; 

Best Local Similarity 24.7%; Pred. No. 1.8e-36; 

Matches 220; Conservative 78; Mismatches 246; Indels 348; Gaps 56; 

Qy 3 ISLNSCLSFICL LLCH WIGTASPLNLE--DPNVCSHW ES 39 

I : : I I I I : I : I I : : I I : I I : I 

Db 529 IDIDDCSSTPCLNGAKCIDHPNGYECQCATGFTGILCDENIDNCDPDPCHHGQCQDGIDS 588 

Qy 4 0 YSVTVQESYPHPF — DQI — YYTS CTDILNWFKCT RHRVSY 7 6 

i : I lllhl | | : : | : : | : : : 

Db 589 YTCICNPGYMGAICSDQIDECYSSPCLNDGRCIDLVNGYQCNCQPGTSGLNCEINFDDCA 648 

Qy 77 RTAYRHG--EKTMYRRKSQCCPGF YESGEMCV 106 

II : I Mil II: 

Db 649 SNPCMHGVCVDGINRYSCVCSPGFTGQRCNIDIDECASNPCRKGATCINDVNGFRCICPE 7 08 

Qy 107 PHC ADKCVHGRC IAPNTCQCEPGWGGTNCSSACDGDHWGPHCTSR 151 

II : : | : | | | : : I I : I I I I I I : II 
Db 709 GPHHPSCYSQVNECLSNPCIHGNCTGGLSGYKCLCDAGWVGWCE--VDKN ECLSN 762 

Qy 152 CQCKNGALCNPITGA — CHCAAGFRGWRCE DRC EQGTYGNDCH-QRCQCQ- 19 8 

I : I I I I : | | | | : I : I : II I I I : I II 

Db 763 -PCQNGGTCNNLVNGYRCTCKKGFKGYNCQVNIDECASNPCLNQGTCFDDVSGYTCHCML 821 

Qy 199 — NGATCDHVTGECRCPPGYTGAFCED LCPPGKHGPQCE QRC PCQ 241 

I I I I I II:: I I II I : I I IE 

Db 822 PYTGKNCQTVLAPCSPNPCENAAVCKEAPNFESFSCLCAPGWQGKRCTVDVDECISKPCM 8 81 

Qy 2 42 NGGVCHHVTGE— CSCPSGWMGTVCGQPCPEGRFGKNCSQEC QCHNGGTC--DAATG 2 94 

11111:1 I I I I : I I : : I lllhl I 

Db 882 NNGVCHNTQGS YVCECPPGFSGMDCEEDI NDCLANPCQNGGSCVDHVNTF 931 

Qy 295 QCHCSPGYTGERCQDE CPVGTYGVLC AETC — 324 

I I I I : I : : I I : I I I : I I I : I 

Db 932 SCQCHPGFIGDKCQTDMNECLSEPCKNGGTCSDYVNSYTCTCPAGFHGVHCENNIDECTE 991 

Qy 325 -QCVNGGKCYHVSG ACLCEAGFAG ERCEAR LC 355 

I I I I I I I : I I I I I I I : : I 

Db 992 SSCFNGGTC--VDGINSFSCLCPVGFTGPFCLHDINECSSNPCLNAGTCVDGLGTYRCIC 1049 

Qy 356 PEGLYGIKCD KRCPCHLENTHSCHPMSGECACKPGWSGLYCNE 398 

I I I I I I I : I I I III I II: 

Db 1050 PLGYTGKNCQTLWLCSRSPCKNKGTCVQEKARPHCLCPPGWDGAYCDVLNVSCKAAALQ 1109 

Qy 399 TCSPGFYGEACQQ ICS CQNGADCDSVTG — 42 6 

I I : I I : : I : I I : I I I : I 

Db 1110 KGVPVEHLCQHSGICINAGNTHHCQCPLGYTGSYCEEQLDECASNPCQHGATCNDFIGGY 1169 

Qy 427 KCTCAPGFKGIDCSTPCPLGTYGINCSSRCGCKNDAVCSPVDGSCTCKAGWHGVDCSIRC 4 86 

: I I I I : : I : : I I : : I : I I : I I 
Db 1170 RCECVPGYQGVNCE YEVDECQNQPCQNGGTCIDLVNHFKCS C 1211 



Qy 


487 


PSGTWGFGC--NL-TC QCLNGGAC-NTLDG~TCTCAPGWRGEKCE LPC 

I 1 I 1 | | : | 1 1 I 1 1 1 : : 1 1 1 1 1 1 : 1 1 : 1 1 1 1 

PPGTRGLLCEENIDECAGGPHCLNGGQCVDRIGGYTCRCLPGFAGERCEGDINECLSNPC 


529 


Db 


1212 


1271 


Qy 


530 


-QDGTYGLNCAE RCDCSHA DGC - nFllbhL 

: 1 : 1 : 1 : III II II 


-Jul 


Db 


1272 


SSEGS— LDCVQLKNNYNCICRSAFTGRHCETFLDVCPQKPCLNGGTCAVASNMPDGFIC 


1329 


Qy 


558 


RCLPGWSGVHCDSVCAEGRW GPNC SLPC 585 

1111:11 II:: Ml IN 
RCPPGFSGARLQSSCGQVKCRRGEQCIHTDSGPRCFCLNPKDCESGCASNPC 13 81 




Db 


1330 





RESULT 13 
NOTC_DROME 

ID NOTC_DROME STANDARD; PRT; 2 7 03 AA. 

AC P07207; 097458; P04154; Q9W4T8; 

DT 01-NOV-1986 (Rel. 03, Created) 

DT 28-FEB-2003 (Rel. 41, Last sequence update) 

DT 15-MAR-2004 (Rel. 43, Last annotation update) 

DE Neurogenic locus Notch protein precursor. 

GN N OR EG:140G11.1 OR EG:163A10.2 OR CG3936. 

OS Drosophila melanogas ter (Fruit fly) . 

OC Eukaryota; Metazoa; Arthropoda; Hexapoda; Insecta; Pterygota; 

OC Neoptera; Endopterygota ; Diptera; Brachycera; Muscomorpha; 

OC Ephydroidea; Drosophilidae; Drosophila. 

OX NCBl_TaxID=7227; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=Oregon-R; TISSUE=Embryo; 

RX MEDLINE=8 607 9539; PubMed=39 3532 5 ; 

RA Wharton K.A., Johansen K.M., Xu T . , Artavanis-Tsakonas S.; 

RT "Nucleotide sequence from the neurogenic locus notch implies a gene 

RT product that shares homology with proteins containing EGF-like 

RT repeats."; 

RL Cell 43:567-581 (1985) . 

RN [2] 

RP SEQUENCE FROM N.A. 

RC STRAIN=Canton-S, and Oregon-R; TI SSUE^Embryo ; 

RX MEDLINE=87064624 ; PubMed=3097517 ; 

RA Kidd S., Kelley M.R., Young M.W.; 

RT "Sequence of the notch locus of Drosophila melanogas ter : relationship 

RT of the encoded protein to mammalian clotting and growth factors."; 

RL Mol. Cell. Biol. 6:30 94-3108(198 6). 

RN [3] 

RP SEQUENCE FROM N.A. 

RC STRAIN=Berkeley; 

RX MEDLINE=20196006; PubMed=107 31132 ; 

RA Adams M.D., Celniker S.E., Holt R.A., Evans C.A., Gocayne J.D., 

RA Amanatides P.G., Scherer S.E., Li P.W., Hoskins R.A. , Galle R.F., 

RA George R.A., Lewis S.E., Richards S., Ashburner M. , Henderson S.N., 

RA Sutton G.G., Wortman J.R., Yandell M.D., Zhang Q. , Chen L.X., 

RA Brandon R.C., Rogers Y.-H.C, Blazej R.G., Champe M. , Pfeiffer B.D., 

RA Wan K.H., Doyle C, Baxter E.G., Helt G . , Nelson C.R., Miklos G.L.G., 

RA Abril J.F., Agbayani A. , An H.-J., Andrews-Pf annkoch C, Baldwin D., 

RA Ballew R.M., Basu A., Baxendale J., Bayraktaroglu L., Beasley E.M., 



RA Beeson K.Y., Benos P.V., Berman B.P., Bhandari D . , Bolshakov S., 

RA Borkova D., Botchan M.R., Bouck J. r Brokstein P., Brottier P., 

RA Burtis K.C., Busam D.A., Butler H. , Cadieu E . , Center A., Chandra I., 

RA Cherry J.M., Cawley S., Dahlke C, Davenport L.B., Davies P . , 

RA de Pablos B., Delcher A., Deng Z., Mays A.D., Dew I., Dietz S.M., 

RA Dodson K. , Doup L.E., Downes M. , Dugan-Rocha S., Dunkov B.C., Dunn P., 

RA Durbin K.J., Evangelista C.C., Ferraz C. , Ferriera S., Fleischmann W., 

RA Fosler C, Gabrielian A.E., Garg N.S., Gelbart W.M., Glasser K. , 

RA Glodek A., Gong F., Gorrell J.H., Gu Z., Guan P., Harris M. , 

RA Harris N.L., Harvey D.A., Heiman T.J., Hernandez J.R., Houck J., 

RA Hostin D., Houston K.A., Howland T.J., Wei M.-H., Ibegwam C, 

RA Jalali M., Kalush F., Karpen G.H., Ke Z . , Kennison J. A., Ketchum K.A., 

RA Kimmel B.E., Kodira CD., Kraft C, Kravitz S., Kulp D., Lai Z., 

RA Lasko P., Lei Y., Levitsky A. A., Li J.H., Li Z., Liang Y. , Lin X., 

RA Liu X., Mattei B., Mcintosh T.C., McLeod M.P., McPherson D. , 

RA Merkulov G. , Milshina N.V., Mobarry C, Morris J., Moshrefi A., 

RA Mount S.M., Moy M. , Murphy B. , Murphy L., Muzny D.M., Nelson D.L., 

RA Nelson D.R., Nelson K.A. , Nixon K., Nusskern D.R., Pacleb J.M., 

RA Palazzolo M. , Pittman G.S., Pan S., Pollard J., Puri V., Reese M.G., 

RA Reinert K. , Remington K., Saunders R.D.C., Scheeler F . , Shen H., 

RA Shue B.C., Siden-Kiamos I., Simpson M., Skupski M.P., Smith T., 

RA Spier E., Spradling A.C., Stapleton M., Strong R. , Sun E. , 

RA Svirskas R. , Tector C, Turner R. , Venter E., Wang A.H., Wang X., 

RA Wang Z.-Y., Wassarman D.A., Weinstock G.M. , Weissenbach J., 

RA Williams S.M., Woodage T., Worley K.C., Wu D., Yang S., Yao Q.A. , 

RA Ye J., Yeh R.-F., Zaveri J.S., Zhan M. , Zhang G., Zhao Q. f Zheng L . , 

RA Zheng X.H., Zhong F.N., Zhong W. , Zhou X., Zhu S., Zhu X., Smith H.O., 

RA Gibbs R.A., Myers E.W. , Rubin G.M., Venter J.C.; 

RT "The genome sequence of Drosophila melanogas ter . " ; 

RL Science 287:2185-2195(2000). 

RN [4] 

RP SEQUENCE FROM N . A. 

RC STRAIN=Oregon-R; 

RX MEDLINE=20196011; PubMed=l 07 3 1 137 ; 

RA Benos P.V., Gatt M.K., Ashburner M. , Murphy L., Harris D. , 

RA Barrell B.G., Ferraz C. , Vidal S., Brun C, Demailles J., Cadieu E . , 

RA Dreano S., Gloux S., Lelaure V. , Mottier S., Galibert F., Borkova D., 

RA Minana B., Kafatos F.C., Louis C, Siden-Kiamos I., Bolshakov S., 

RA Papagiannakis G w Spanos L . , Cox S., Madueno E., de Pablos B., 

RA Modolell J., Peter A., Schoettler P., Werner M. , Mourkioti F . , 

RA Beinert N., Dowe G., Schaefer U., Jaeckle H., Bucheton A. , 

RA Callister D.M., Campbell L.A. , Darlamitsou A. f Henderson N.S., 

RA McMillan P.J., Salles C, Tait E.A., Valenti P., Saunders R.D.C., 

RA Glover D.M. ; 

RT "From sequence to chromosome: the tip of the X chromosome of D. 

RT melanogaster . " ; 

RL Science 287:2220-2222(2000). 

RN [5] 

RP SEQUENCE OF 2505-2611 FROM N.A. 

RX MEDL1NE= 85099329; PubMed=2 9 81631; 

RA Wharton K.A., Yedvobnick B. , Finnerty V.G., Artavanis-Tsakonas S.; 

RT "opa: a novel family of transcribed repeats shared by the Notch locus 

RT and other developmentally regulated loci in D. melanogaster."; 

RL Cell 40: 55-62 (1985) . 

RN [6] 

RP SEQUENCE OF 1-8 FROM N.A. 

RX MEDLINE= 87257846; PubMed=3 0 37 327; 



RA Kelley M.R., Kidd S., Berg R.L., Young M.W.; 

RT "Restriction of P-element insertions at the Notch locus of Drosophila 

RT melanogaster . " ; 

RL Mol. Cell. Biol. 7:1545-154 8(1987). 

RN [7] 

RP INTERACTION WITH DX, AND MUTANT SU42C. 

RX MEDLINE=9 4 2154 8 9; PubMed=8 1 62 8 4 8 ; 

RA Diederich R.J., Matsuno K. , Hing H., Artavanis-Tsakonas S.; 

RT "Cytosolic interaction between deltex and Notch ankyrin repeats 

RT implicates deltex in the Notch signaling pathway."; 

RL Development 12 0:473-481(1994). 

RN [8] 

RP INTERACTION WITH DX . 

RX MEDLINE=95401878; PubMed=7 67 1825 ; 

RA Matsuno K. , Diederich R.J., Go M.J., Blaumueller CM., 

RA Artavanis-Tsakonas S.; 

RT "Deltex acts as a positive regulator of Notch signaling through 

RT interactions with the Notch ankyrin repeats."; 

RL Development 121:2633-2644(1995). 

RN [9] 

RP S3 CLEAVAGE BY PSN. 

RX MEDLINE=992214 87; PubMed=102 0664 6 ; 

RA Struhl G., Greenwald I.; 

RT "Presenilin is required for activity and nuclear access of Notch in 

RT Drosophila . " ; 

RL Nature 398:522-525(1999). 

RN [10] 

RP S3 CLEAVAGE BY PSN. 

RX MEDLINE=9 922 14 8 8 ; PubMed=l 02 0664 7 ; 

RA Ye Y., Lukinova N., Fortini M.E.; 

RT "Neurogenic phenotypes and altered Notch processing in Drosophila 

RT Presenilin mutants."; 

RL Nature 398:525-529(1999). 

RN [11] 

RP S2 CLEAVAGE BY KUZ . 

RX MEDLINE-21657146; PubMed-117 99064 ; 

RA Lieber T . , Kidd S., Young M.W. ; 

RT "kuzbanian-mediated cleavage of Drosophila Notch."; 

RL Genes Dev. 16:209-221(2002). 

RN [12] 

RP MUTANT MCD5. 

RX MEDLINE=21575956; PubMed-1 17 1 92 1 4 ; 

RA Ramain P., Khechumian K. , Seugnet L., Arbogast N. f Ackermann C, 

RA Heitzler P. ; 

RT "Novel Notch alleles reveal a Deltex-dependent pathway repressing 

RT neural fate . " ; 

RL Curr. Biol. 11:172 9-17 38(2 001). 

RN [13] 

RP REVIEW. 

RX MEDLINE=222 5 657 0; PubMed=12 3 69105 ; 

RA Portin P . ; 

RT "General outlines of the molecular genetics of the Notch signalling 

RT pathway in Drosophila melanogaster: a review."; 

RL Hereditas 136:89-96(2002). 

CC -!- FUNCTION: Signaling protein, which regulates, with both positive 
CC and negative signals, the differentiation of at least central and 

CC peripheral nervous system and eye, wing disk, oogenesis, segmental 



CC appendages such as antennae and legs, and muscles, through lateral 

CC inhibition or induction. Functions as a receptor for membrane- 

CC bound ligands Delta and Serrate to regulate cell-fate 

CC determination. Upon ligand activation, and releasing from the cell 

CC membrane, the Notch intracellular domain (NICD) forms a 

CC transcriptional activator complex with Su(H) (Suppressor of 

CC hairless) and activates genes of the E(spl) complex. Essential for 

CC proper differentiation of ectoderm. 

CC -!- SUBUNIT: Interacts with Su(H) when activated. Interacts with Dx 
CC via its ANK repeats . 

CC -!- SUBCELLULAR LOCATION: Type I membrane protein. Upon activation and 
CC S3 cleavage, it is released from the cell membrane and enters into 

CC the nucleus in conjunction with Su(H) . 

CC -!- PTM: Upon binding its ligands such as Delta or Serrate, it is 
CC cleaved (S2 cleavage) in its extracellular domain, close to the 

CC transmembrane domain. S2 cleavage is probably mediated by Kuz. It 

CC is then cleaved (S3 cleavage) downstream of its transmembrane 

CC domain, releasing it from the cell membrane. S3 cleavage requires 

CC Psn. 

CC -!- SIMILARITY: Belongs to the NOTCH family. 

CC -!- SIMILARITY: Contains 36 EGF-like domains. 

CC -!- SIMILARITY: Contains 3 Lin/Notch repeats. 

CC -!- SIMILARITY: Contains 6 ANK repeats. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch ) . 

CC 

DR EMBL; M16152; AAB59220.1; -. 

DR EMBL; M16153; AAB59220.1; JOINED. 

DR EMBL; M16149; AAB59220.1; JOINED . 

DR EMBL; M16150; AAB59220.1; JOINED . 

DR EMBL; M16151; AAB59220.1; JOINED. 

DR EMBL; K03508; AAA28725.1; 

DR EMBL; M13689; AAA28725.1; JOINED. 

DR EMBL; K03507; AAA28725.1; JOINED. 

DR EMBL; AE003426; AAF45848.2; -. 

DR EMBL; AL035436; CAB37610.1; 

DR EMBL; AL035395; CAB37610.1; JOINED. 

DR EMBL; M12175; AAA74496.1; -. 

DR EMBL; M16025; AAA28726.1; 



Query Match 18.8%; Score 677; DB 1; Length 2703; 

Best Local Similarity 25.4%; Pred. No. 1.9e-36; 

Matches 208; Conservative 78; Mismatches 203; Indels 330; Gaps 51; 
Qy 7 SCL SFICLLLCHWIGTASPLNLED--PNVCSHWESYSVTVQESYPHPEDQI YYTSC 60 



Db 502 SCLDDPGTFRCVCMPGFTGTQCEIDIDECQSNPC LNDGTC 541 

Qy 61 T DI LNWFKCT RHRVS YRTAYRHGEKTMYRRKSQCC PG FYE S GEMC VPHCADKCVHGR 117 

I : I I I I : | | | : I I : I : i 

Db 542 HDKINGFKCS CALGF--TGARCQINIDDCQSQPCRNR 576 



Qy 118 C1APNTCQCEPGWGGTNCS SACDGDHWGPHCTSRCQCKNGALCNPITG-ACH 168 

I I : I : I I i : I I : I : I I : II : : I 

Db 577 GICHDSIAGYSCECPPGYTGTSCEININDCDSN PCHRGKCIDDVNSFKCL 62 6 

Qy 169 CAAGFRGW RCEDR CEQGTYG NDCHQRCQ 196 

I | : I : I : I I hill I : I I 

Db 627 CDPGYTGYICQKQINECESNPCQFDGHCQDRVGSYYCQCQAGTSGKNCEVNVNECHSN-P 685 

Qy 197 CQNGATC-DHVTG-ECRCPPGYTGAFCEDLCPPGKHGPQCEQRCPCQNGGVC-HHVTG-E 252 

Ml: : I : I I I : II II I : : I I I I I I i I I : 

Db 68 6 CNNGATCIDGINSYKCQCVPGFTGQHCE KNVDEC I S - S PCANNGVC I DQVNGYK 73 8 

Qy 253 CSCPSGWMGTVC GQP CPEGRFGKNCS QECQ-- 282 

I I I I : I I I I I I I I II 

Db 739 CECPRGFYDAHCLSDVDECASNPCVNEGRCEDGINEFICHCPPGYTGKRCELDIDECSSN 798 

Qy 283 -CHNGGTC-DAATG-QCHCSPGYTGERCQ DECPVGTYGVLCAETCQCVNGGKCY-HV 335 

I : I I II I I I I M I I : : I : hi I I I I I I I 

Db 799 PCQHGGTCYDKLNAFSCQCMPGYTGQKCETNIDDC VTNPCGNGGTCIDKV 848 

Qy 336 SG-ACLCEAGFAGERCEARLCPEGLYGIKC-DKRCPCHLENTHSCHPMSG ECACKP 389 

: | | : |: I I I h : : I I II : I III III 

Db 849 NGYKCVCKVPFTGRDCESKMDP CASNRC KNEAKCTPSSNFLDFSCTCKL 8 97 

Qy 390 GWSGLYCNE TCSPGFYGEAC— QQICS- — CQN 417 

I : : I I I : I I : I : I I hill 

Db 898 GYTGRYCDEDIDECSLSSPCRNGASCLNVPGSYRCLCTKGYEGRDCAINTDDCASFPCQN 957 

Qy 418 GADCDSVTG--KCTCAPGFKGIDCST PCPLGTYGI 450 

II I I I I I I I I I I I I I I 

Db 958 GGTCLDGIGDYSCLCVDGFDGKHCETDINECLSQPCQNGATCSQYVNSYTCTCPLGFSGI 1017 

Qy 451 NCS SRCGCKNDAVCSPVDG SCTCKAGWHGVDCSIR 485 

II : II I : I I : | : | | | : i : | : 

Db 1018 NCQTNDEDCTESSCLNGGSC— IDGINGYNCSCLAGYSGANCQYKLNKCDSNPCLNGATC 1075 

Qy 4 86 CPSGTWGFGCNL TCQCLNGGACNTL--DGTCTCAPGWRGEKCE- 52 6 

I I I I I I : III h : : I h It h h 

Db 1076 HEQNNEYTCHCPSGFTGKQCSEYVDWCGQSPCENGATCSQMKHQFSCKCSAGWTGKLCDV 1135 

Qy 527 — LPCQDGT-- YGLNCAERCD CSHADGCHPTTGHCRCLPGWSGVHC 568 

: I I I | |: : h I I I I h : I : I 

Db 1136 QTI SCQDAADRKGLSLRQLCNNGTCKDYGNSHV CYCSQGYAGSYCQKEI DECQSQP 1191 

Qy 569 DSVCAEGRWGPNCSL PC 585 

: I : I I i I I II 
Db 1192 CQNGGTCRDLIGAYECQCRQGFQGQNCELNIDDCAPNPC 1230 



RESULT 14 
NTC4_HUMAN 

ID NTC4_HUMAN STANDARD; PRT; 2 0 03 AA. 

AC Q99466; 000306; Q99458; Q99940; Q9H3S8; Q9UII9; Q9UIJ0; 

DT 28-FEB-2003 (Rel. 41, Created) 

DT 28-FEB-2003 (Rel. 41, Last sequence update) 

DT 28-FEB-2003 (Rel. 41, Last annotation update) 



DE Neurogenic locus notch homolog protein 4 precursor (Notch 4) 

DE (hNotch4 ) . 

GN N0TCH4 . 

OS Homo sapiens (Human) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 

OC Mammalia; Eutheria; Primates; Catarrhini; Horninidae; Homo. 

OX NCBI_TaxID=9 60 6; 

RN [1] 

RP SEQUENCE FROM N.A. (ISOFORM 1), AND POLYMORPHISM OF POLY-LEU. 

RC TISSUE=Placenta; 

RX MEDLINE-97 311416; PubMed=9168133 ; 

RA Sugaya K. , Sasanuma S.-I., Nohata J., Kimura T., Fukagawa T . , 

RA Nakamura Y . , Ando A., Inoko H., Ikemura T . , Mita K.; 

RT "Gene organization of human N0TCH4 and (CTG)n polymorphism in this 

RT human counterpart gene of mouse proto-oncogene Int3."; 

RL Gene 18 9:235-244(1997). 

RN [2] 

RP SEQUENCE FROM N.A. (ISOFORMS 1; 2 AND 3) . 

RC TISSUE=Bone marrow, and Heart; 

RX MEDLINE=98360091; PubMed=9 693 032 ; 

RA Li L., Huang G.M., Banta A.B., Deng Y., Smith T., Dong P., 

RA Friedman C, Chen L . , Trask B.J., Spies T., Rowen L. , Hood L . ; 

RT "Cloning, characterization, and the complete 56 . 8-kilobase DNA 

RT sequence of the human NOTCH4 gene."; 

RL Genomics 51:45-58(1998). 

RN [3] 

RP SEQUENCE OF 1-503 FROM N.A. , AND VARIANTS GLN-117 AND GLN-317. 

RA Miyagawa T., Tokunaga K. , Hojho H.; 

RT "Human notch4 gene variant."; 

RL Submitted (FEB-1999) to the EMBL/ GenBank/DDB J databases. 

RN [4] 

RP IDENTIFICATION OF LIGANDS. 

RX MEDLINE=9918 07 65; PubMed=1007 9256; 

RA Gray G.E., Mann R.S., Mitsiadis E . , Henrique D., Carcangiu M.-L., 

RA Banks A., Leiman J., Ward D . , Ish-Horowitz D., Artavanis-Tsakonas S. 

RT "Human ligands of the Notch receptor."; 

RL Am. J. Pathol. 154:7 85-794(1999). 

CC -!- FUNCTION: Functions as a receptor for membrane -bound ligands 
CC Jaggedl, Jagged2 and Deltal to regulate cell-fate determination. 

CC Upon ligand activation through the released notch intracellular 

CC domain (NICD) it forms a transcriptional activator complex with 

CC RBP-J kappa and activates genes of the enhancer of split locus. 

CC Affects the implementation of differentiation, proliferation and 

CC apoptotic programs. May regulate branching morphogenesis in the 

CC developing vascular system (By similarity) . 

CC -!- SUBUNIT: Heterodimer of a C-terminal fragment N(TM) and a N- 
CC terminal fragment N(EC) which are probably linked by disulfide 

CC bonds (By similarity) . 

CC -!- SUBCELLULAR LOCATION: Type I membrane protein. Following 

CC proteolytical processing NICD is translocated to the nucleus. 

CC -!- ALTERNATIVE PRODUCTS: 

CC Event=Alternative splicing; Named isoforms=3; 

CC Comment=Experimental confirmation may be lacking for some 

CC isoforms; 

CC Name=l; 

CC IsoId=Q9 94 66-l; Sequence=Di splayed; 

CC Name=2; 



CC IsoId-Q9 94 66-2; Sequence=VSP__0 014 06 ; 

CC Name=3; 

CC IsoId=Q99466-3; Sequence=VSP_0014 07 ; 

CC -!- TISSUE SPECIFICITY: Highly expressed in the heart, moderately in 
CC the lung and placenta and at low levels in the liver, skeletal 

CC muscle, kidney, pancreas, spleen, lymph node, thymus, bone marrow 

CC and fetal liver. No expression was seen in adult brain or 

CC peripheral blood leukocytes. 

CC -!- PTM: Synthesized in the endoplasmic reticulum as an inactive form 

CC which is proteolytically cleaved by a furin-like convertase in the 

CC trans-Golgi network before it reaches the plasma membrane to yield 

CC an active, ligand-accessible form. Cleavage results in a C- 

CC terminal fragment N (TM) and a N-terminal fragment N(EC). Following 

CC ligand binding, it is cleaved by TNF-alpha converting enzyme 

CC (TACE) to yield a membrane-associated intermediate fragment called 

CC notch extracellular truncation (NEXT) . This fragment is then 

CC cleaved by presenilin dependent gamma-secretase to release a 

CC notch-derived peptide containing the intracellular domain (NICD) 

CC from the membrane (By similarity) . 

CC -!- PTM: Phosphorylated (By similarity). 

CC -!- POLYMORPHISM: The poly-Leu region of NOTCH4 (in the signal 

CC peptide) is polymorphic and the number of Leu varies in the 

CC population (from 6 to 12) . 

CC -!- SIMILARITY: Belongs to the NOTCH family. 

CC -!- SIMILARITY: Contains 28 EGF-like domains. 

CC -!- SIMILARITY: Contains 3 Lin/Notch repeats. 

CC -!- SIMILARITY: Contains 5 ANK repeats. 

CC -!- CAUTION: Ref.l sequence differs from that shown due to frameshifts 
CC in position 1438 to 1463. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; D63395; BAA09708.1; ALT_FRAME. 

DR EMBL ; D86566; BAA13116.1; -. 

DR EMBL; U95299; AAC32288.1; -. 

DR EMBL; U89335; AAC63097.1; 

DR EMBL; AB023961; BAB20317.1; -. 

DR EMBL; AB024520; BAA88951.1; -. 

DR EMBL; AB024578; BAA88952.1; -. 

DR HSSP; P08709; 1BF9 . 

DR Genew; HGNC:7884; N0TCH4 . 

DR MIM; 164951; -. 

DR InterPro; IPR002110; ANK . 

DR InterPro; IPR000152; Asx_hydroxyl_S . 

DR InterPro; IPR000742; EGF_2 . 

DR InterPro; IPR001881; EGF_Ca . 

DR InterPro; IPR001438; EGF_II. 

DR InterPro; IPR006209; EGF_like. 

DR InterPro; IPR002049; Laminin_EGF. 

DR InterPro; IPR008297; Notch. 

DR InterPro; IPR000800; Notch dom. 



DR 


Pfam; PF00023; ank; 6. 








DR 


Pfam; PF00008; EGF; 26. 








DR 


Pfam; PF00066; notch; 2. 








DR 


PIRSF; PIRSF002279; Notch; 1. 








DR 


PRINTS; 


PR00010; 


EGFBLOOD . 








DR 


PRINTS; 


PR00011; 


EGFLAMININ. 








DR 


PRINTS; 


PR01452; 


NOTCH. 








DR 


SMART; SM00248; ANK; 5. 








DR 


SMART; SM0017 9; EGF_CA; 11. 








DR 


SMART; SM00004; NL; 3. 








DR 


PROSITE; 


PS50297; 


ANK REP REGION; 1. 






DR 


PROSITE; 


PS50088; 


ANK REPEAT; 


5 . 






DR 


PROSITE; 


PS00010; 


ASX HYDROXYL; 11. 






DR 


PROSITE; 


PS00022; 


EGFJL; 28. 








DR 


PROSITE; 


PS01186; 


EGF_2; 21. 








DR 


PROSITE; 


PS50026; 


EGF 3; 28. 








DR 


PROSITE; 


PS01187; 


EGF_CA; 9 . 








KW 


Receptor 


; Transcription regulation; Activator; Differentiation; 


KW 


Developmental protein; Repeat; ANK repeat 


; EGF- like domain ; 


KW 


Transmembrane; Glycoprotein; 


Signal; Phosphorylation; Polymorphism; 


KW 


Triplet 


repeat expansion; Alternative 


splicing . 
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Query Match 18.8%; Score 675.5; DB 1; Length 2003; 

Best Local Similarity 26.3%; Pred. No. 1.9e-36; 

Matches 211; Conservative 45; Mismatches 210; Indels 337; Gaps 46; 



Qy 94 CCPGFYESGEMCVPHCADKC VHGRCIAPNT CQCEPGWGGTNCSSACDGDH 143 

I I I I : I I I II III : I I I I I I I 

Db 105 CLPGF--TGERCQAKLEDPCPPSFCSKRGRCHIQASGRPQCSCMPGWTGEQCQLR 157 

Qy 144 WGPHCTSRCQCKNGALCNPITG — ACHCAAGFRGWRCE DRCEQG TYGNDCHQ- 193 

I : : I I I : I I I I I I I I I : I I I I I 

Db 15 8 — DFCSAN-PCVNGGVCLATYPQIQCHCPPGFEGHACERDVNECFQDPGPCPKGTSCHNT 214 

Qy 194 RCQ CQNGATC DHVTGECRCPPGYTGAFCE 222 

II: I I I I I I I I I I I : I 1 I 

Db 215 LGSFQCLCPVGQEGPRCELRAGPCPPRGCSNGGTCQLMPEKDSTFHLCLCPPGFIGPDCE 27 4 

Qy 223 DLCPPGKHG PQCEQRCP--CQNGGVCHH 248 

Ml I : I! : I I : I i I I : 

Db 275 VNPDNCVSHQCQNGGTCQDGLDTYTCLCPETWTGWDCSEDVDECETQGPPHCRNGGTCQN 334 

Qy 249 VTG — ECSCPSGWMGTVCGQP CPEGRFGKNCSQE- 2 80 

I I I I I I I I I : I I I I I I i 

Db 335 SAGSFHCVCVSGWGGTSCEENLDDCIAATCAPGSTCIDRVGSFSCLCPPGRTGLLCHLED 394 

Qy 281 -C QCHNGGTC--DAATGQ — CHCSPGYTGERCQ DECPVGTYGVLCAETCQCVNG 32 9 

I II 1:11 llllhll III: I I : I 

Db 395 MCLSQPCHGDAQCSTNPLTGSTLCLCQPGYSGPTCHQDLDECLMAQQG PSPCEHG 44 9 

Qy 33 0 GKCYHVS GA- - CLCEAGFAGERCEAR LCPEGLYGIK 3 63 

I I : I : II I I : I I I I I I I I I I I 

Db 450 GSCLNTPGSFNCLCPPGYTGSRCEADHNECLSQPCHPGSTCLDLLATFHCLCPPGLEGQL 509 



Qy 364 CD KRC PCHLENTHSCHPMSG — ECACKPGWSGLYCNE 398 

I : I i I I II: : I I I I : I I I i 

Db 510 CEVETNECASAPC--LNHADCHDLLNGFQCICLPGFSGTRCEEDIDECRSSPCANGGQCQ 567 

Qy 399 TCSPGFYGEACQ QIC SCQNGADCDSVTGK — CTCAPGFKGIDCSTP 442 

I I I i I I I I 1111:11111111 

Db 568 DQPGAFHCKCLPGFEGPRCQTEVDECLSDPCPVGASCLDLPGAFFCLCPSGFTGQLCEVP 627 

Qy 443 CPLGTYGI NCSSRCG-CKNDAVCSPVDGSCTC 473 

I I I : I | | : | I : III 
Db 628 LCAPNLCQPKQICKDQKDKANCLCPDGSPGCAPPEDNCTCHHGHCQR SSCVC 679 

Qy 474 KAGWHGVDC SIRCPSGTWGFGCN LTCQ CL 502 

| | I : I : | I : I I I : I II 

Db 68 0 DVGWTGPECEAELGGCISAPCAHGGTCYPQPSGYNCTCPTGYTGPTCSEEMTACHSGPCL 7 39 

Qy 503 NGGACNTLDG — TCTCAPGWRGEKCE LPC QDGTYGLNCA 539 

111:11 I I I I I I : I : I I : I I : II 

Db 74 0 NGGSCNPSPGGYYCTCPPSHTGPQCQTSTDYCVSAPCFNGGTCVNRPGTFSCLCAMGFQG 7 99 

Qy 540 ERCD CSHADGCH — PTTGHCRCLPGWSGVHCDS 570 

II: I : I I I i I : : I I : 

Db 800 PRCEGKLRPS CADS PCRN RAT CQDSPQGPRCLCPTGYTGGSCQTLMDLCAQKPCPRNSHC 859 

Qy 571 VCAEGRWGPNCSLP 5 84 

: I : I I I I : I I 
Db 8 60 LQTGPSFHCLCLQGWTGPLCNLP 8 82 



RESULT 15 


NTC1_ 


RAT 


ID 


NTC1 RAT STANDARD; PRT; 2531 AA. 


AC 


Q07008; 


DT 


01-NOV-1995 (Rel. 32, Created) 


DT 


15-JUL-1999 (Rel. 38, Last sequence update) 


DT 


28-FEB-2003 (Rel. 41, Last annotation update) 


DE 


Neurogenic locus notch hornolog protein 1 precursor (Notch 1) . 


GN 


NOTCH1 . 


OS 


Rattus norvegicus (Rat) . 


OC 


Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 


OC 


Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Rattus 


OX 


NCBI TaxID=10116; 


RN 


[1] 


RP 


SEQUENCE FROM N . A. 


RC 


TISSUE=Schwann cell; 


RX 


MEDLINE-92 111383; PubMed=17 64 9 9 5 ; 


RA 


Weinmaster G., Roberts V.J., Lemke G. ; 


RT 


"A hornolog of Drosophila Notch expressed during mammalian 


RT 


development . " ; 


RL 


Development 113:199-205(1991). 


RN 


[2] 


RP 


REVISIONS TO 1652-1653. 


RA 


Weinmaster G. ; 


RL 


Submitted (APR-1998) to the EMBL/ GenBank/DDBJ databases. 


RN 


[3] 


RP 


FUNCTION . 


RX 


MEDLINE=210 94 508; PubMed=11182 08 0; 



RA Tanigaki K . , Nogaki F. , Takahashi J., Tashiro K., Kurooka H., 

RA Hon jo T. ; 

RT "Notchl and Notch3 instructively restrict bFGF-responsive multipotent 

RT neural progenitor cells to an astroglial fate."; 

RL Neuron 29:45-55(2001). 

RN [4] 

RP TISSUE SPECIFICITY . 

RX MEDLINE=93202015; PubMed=12 9 574 5 ; 

RA Weinmaster G. , Roberts V.J., Lemke G. ; 

RT "Notch2: a second mammalian Notch gene."; 

RL Development 116:931-941(1992). 

RN [5] 

RP TISSUE SPECIFICITY. 

RX MEDLINE=21331789; PubMed-1 14 3 8 92 2 ; 

RA Irvin D.K., Zurcher S.D., Nguyen T . , Weinmaster G . , Kornblum H.I.; 

RT "Expression patterns of Notchl, Notch2, and Notch3 suggest multiple 

RT functional roles for the Not ch-DSL signaling system during brain 

RT development . " ; 

RL J. Comp. Neurol. 436:167-181(2001). 

CC -!- FUNCTION: Functions as a receptor for membrane-bound ligands 
CC Jaggedl, Jagged2 and Deltal to regulate cell-fate determination. 

CC Upon ligand activation through the released notch intracellular 

CC domain (NICD) it forms a transcriptional activator complex with 

CC RBP-J kappa and activates genes of the enhancer of split locus. 

CC Affects the implementation of differentiation, proliferation and 

CC apoptotic programs (By similarity) . Acts instructively to control 

CC the cell fate determination of CNS multipotent progenitor cells, 

CC resulting in astroglial induction and neuron/oligodendrocyte 

CC suppression. 

CC -!- SUBUNIT: Heterodimer of a C-terminal fragment N(TM) and a N- 
CC terminal fragment N (EC) which are probably linked by disulfide 

CC bonds (By similarity) . 

CC -!- SUBCELLULAR LOCATION: Type I membrane protein. Following 

CC proteolytical processing NICD is translocated to the nucleus (By 

CC similarity) . 

CC -!- TISSUE SPECIFICITY: Expressed in the brain, kidney and spleen. 
CC Expressed in postnatal central nervous system (CNS) germinal zones 

CC and, in early postnatal life, within numerous cells throughout the 

CC CNS. Found in both subventricular and ventricular germinal zones. 

CC -!- DEVELOPMENTAL STAGE: In the embryo, highest levels occur between 
CC days 12 and 14 and decrease rapidly to much lower levels in the 

CC adult. 

CC -!- PTM: Synthesized in the endoplasmic reticulum as an inactive form 

CC which is proteolytically cleaved by a furin-like convertase in the 

CC trans-Golgi network before it reaches the plasma membrane to yield 

CC an active, ligand-accessible form. Cleavage results in a C- 

CC terminal fragment N(TM) and a N-terminal fragment N(EC). Following 

CC ligand binding, it is cleaved by TNF-alpha converting enzyme 

CC (TACE) to yield a membrane-associated intermediate fragment called 

CC notch extracellular truncation (NEXT) . This fragment is then 

CC cleaved by presenilin dependent gamma-secretase to release a 

CC notch-derived peptide containing the intracellular domain (NICD) 

CC from the membrane (By similarity) . 

CC -!- PTM: Phosphorylated (By similarity). 

CC -!- SIMILARITY: Belongs to the NOTCH family. 

CC -!- SIMILARITY: Contains 36 EGF-like domains. 

CC SIMILARITY: Contains 3 Lin/Notch repeats. 
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SIMILARITY: Contains 5 ANK repeats. 



InterPro ; 
InterPro ; 
InterPro ; 
InterPro; 
InterPro; 
InterPro; 



EGF_Ca . 
EGF_II . 
EGF_like. 
Laminin_EGF . 
Notch. 



This SWISS-PROT entry is copyright. It is produced through a collaboration 
between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 
the European Bioinf ormatics Institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed. Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to license@isb-sib . ch) . 

EMBL; X57405; CAA40667.1; - . 
HSSP; P00740; 1EDM. 
InterPro; IPR002110; ANK. 
InterPro; IPR000152; Asx_hydroxyl_S . 
InterPro; IPR000742; EGF_2 . 
IPR001881; 
IPR001438; 
IPR006209; 
IPR002049; 
IPR008297; 

I PRO 0 0 8 0 0 ; Not ch_dom . 
Pfam; PF00023; ank; 6. 
Pfam; PF00008; EGF; 35. 
Pfam; PF00066; notch; 3. 
PIRSF; PIRSF002279; Notch; 1. 
PRINTS; PR00010; EGFBLOOD. 
PRINTS; PR00011; EGFLAMININ. 
PRINTS; PR01452; NOTCH. 
SMART; SM00248; ANK; 6. 
SMART; SM0 017 9; EGF_CA; 25. 
SMART; SM00004; NL; 2. 

PROSITE; PS50297; ANK_REP_REGION ; 1. 
PROSITE; PS50088; ANK^REPEAT; 4. 
PROSITE; PS00010; ASXJHYDROXYL ; 22. 
PROSITE; PS00022; EGF_1; 35. 

PS01186; EGF_2; 26. 
PS50026; EGF_3 ; 36. 
PS01187; EGF_CA; 21. 
Receptor; Transcription regulation; Activator; Differentiations- 
Developmental protein; Repeat; ANK repeat; EGF-like domain; 
Transmembrane; Glycoprotein; Signal; Phosphorylation . 
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601 


EGF-LIKE 


15, 


CALCIUM- 


BINDING 


(POTENTIAL 


FT 


DOMAIN 


603 


639 


EGF-LIKE 


16, 


CALCIUM- 


BINDING 


(POTENTIAL 


FT 


DOMAIN 


641 


676 


EGF-LIKE 


17, 


CALCIUM- 


BINDING 


(POTENTIAL 


FT 


DOMAIN 


678 


714 


EGF-LIKE 


18, 


CALCIUM- 


BINDING 


(POTENTIAL 


FT 


DOMAIN 


716 


751 


EGF-LIKE 


19, 


CALCIUM- 


BINDING 


(POTENTIAL 


FT 


DOMAIN 


753 


789 


EGF-LIKE 


20, 


CALCIUM- 


BINDING 


(POTENTIAL 


FT 


DOMAIN 


791 


827 


EGF-LIKE 


21, 


CALCIUM- 


BINDING 


(POTENTIAL 


FT 


DOMAIN 


829 


867 


EGF-LIKE 


22 . 








FT 


DOMAIN 


869 


905 


EGF-LIKE 


23, 


CALCIUM- 


BINDING 


(POTENTIAL 


FT 


DOMAIN 


907 


943 


EGF-LIKE 


24 . 








FT 


DOMAIN 


945 


981 


EGF-LIKE 


25, 


CALCIUM- 


BINDING 


(POTENTIAL 


FT 


DOMAIN 


983 


1019 


EGF-LIKE 


26, 








FT 


DOMAIN 


1021 


1057 


EGF-LIKE 


27, 


CALCIUM- 


BINDING 


(POTENTIAL 


FT 


DOMAIN 


1059 


1095 


EGF-LIKE 


28 








FT 


DOMAIN 


1097 


1143 


EGF-LIKE 


29 








FT 


DOMAIN 


1145 


1181 


EGF-LIKE 


30, 


CALCIUM- 


BINDING 


( POTENTIAL 


FT 


DOMAIN 


1183 


1219 


EGF-LIKE 


31, 


CALCIUM- 


BINDING 


(POTENTIAL 


FT 


DOMAIN 


1221 


1265 


EGF-LIKE 


32, 


CALCIUM- 


BINDING 


(POTENTIAL 


FT 


DOMAIN 


1267 


1305 


EGF-LIKE 


33 








FT 


DOMAIN 


1307 


1346 


EGF-LIKE 


34 








FT 


DOMAIN 


1348 


1384 


EGF-LIKE 


35 








FT 


DOMAIN 


1387 


1426 


EGF-LIKE 


36 








FT 


REPEAT 


1445 


1480 


LIN/NOTCH 1 








FT 


REPEAT 


1481 


1522 


LIN/NOTCH 2 








FT 


REPEAT 


1523 


1562 


LIN/NOTCH 3 








FT 


REPEAT 


1917 


1946 


ANK 1. 










FT 


REPEAT 


1950 


1980 


ANK 2. 










FT 


REPEAT 


1984 


2013 


ANK 3. 










FT 


REPEAT 


2017 


2046 


ANK 4 . 










FT 


REPEAT 


2050 


2079 


ANK 5. 










FT 


DOMAIN 


1730 


1733 


POLY-7VLA. 










FT 


DOMAIN 


1891 


1894 


POLY-GLU. 










FT 


DOMAIN 


2258 


2261 


POLY-PRO. 










FT 


DOMAIN 


2497 


2500 


POLY-SER. 










FT 


SITE 


1654 


1655 


CLEAVAGE 


BY 


(FURIN-LIKE PROTEASE) (BY 


FT 








SIMILARITY) 








FT 


DISULFID 


24 


37 


BY SIMILARITY. 






FT 


DISULFID 


31 


46 


BY SIMILARITY. 






FT 


DISULFID 


48 


57 


BY SIMILARITY. 






FT 


DISULFID 


63 


74 


BY SIMILARITY. 






FT 


DISULFID 


68 


87 


BY SIMILARITY. 






FT 


DISULFID 


89 


98 


BY SIMILARITY. 






FT 


DISULFID 


106 


117 


BY SIMILARITY. 






FT 


DISULFID 


111 


127 


BY SIMILARITY. 






FT 


DISULFID 


129 


138 


BY SIMILARITY. 






FT 


DISULFID 


144 


155 


BY SIMILARITY. 






FT 


DISULFID 


149 


164 


BY SIMILARITY. 






FT 


DISULFID 


166 


175 


BY SIMILARITY. 






FT 


DISULFID 


182 


195 


BY SIMILARITY. 






FT 


DISULFID 


189 


204 


BY SIMILARITY. 







FT DISULFID 206 215 BY SIMILARITY. 

FT DISULFID 222 233 BY SIMILARITY. 

FT DISULFID 227 243 BY SIMILARITY. 

Query Match 18. 1%; Score 675; DB 1; Length 2531; 

Best Local Similarity 25.7%; Pred. No. 2.4e-36; 

Matches 208; Conservative 70; Mismatches 206; Indels 324; Gaps 48; 

Qy 4 SLNSCLS FICLLLCHWIGTASPLNLEDPNVCSHWESYS VTVQESY 4 8 

: : I I I : : I I I I : I I : I I : 
Db 603 NINECHSQPCRHGGTCQDRDNYYLCLCLKGTTGPNCEINLDD CA 64 6 

Qy 4 9 PHPFDQIYYTSCTDILNWFKCTRHRVSYRTAYRHGEKTMYRRKSQCCPGFYESGEMC 105 

: | | : | | : : : : I I I I : : I I I 

Db 647 SNPCDS GTCLDKIDGYECA CEPGY--TGSMCNVN 678 

Qy 106 VPHCA DKCVHGRC IAPNT 12 3 

: I i : I : I I | : 

Db 679 IDECAGSPCHNGGTCEDGIAGFTCRCPEGYHDPTCLSEVNECNSNPCIHGACRDGLNGYK 738 

Qy 124 CQCEPGWGGTNC SSACDGDHWGPHCTSRCQCKNGALCNPITG--ACHCAAGFRGWRC 178 

I I I I I I I II : : I : : III: I I I I i I 

Db 739 CDCAPGWSGTNCDINNNECESN PCVNGGTCKDMTSGYVCTCREGFSGPNC 788 

Qy 179 EDRCEQGTYGNDCHQRCQCQNGATC-DHVTG-ECRCPPGYTGAFCEDLCPPGKHGPQCEQ 236 

: I I : I I I I I I I I : I I I I I i I I I : I I 
Db 789 Q TNINECASN-PCLNQGTCIDDVAGYKCNCPLPYTGATCEWLAP C-A 834 

Qy 237 RCPCQNGGVC HHVTGECSCPSGWMGTVC GQPCPEGR 272 

I I : I I I I : : I I I : I I I I II I 

Db 835 TSPCKNSGVCKESEDYESFSCVCPTGWQGQTCEIDINECVKSPCRHGASCQNTNGSYRCL 894 

Qy 273 FGKNCS QECQ CHNGGTCDAATGQ- --CHCSPGYTGERCQDE 310 

I : I I : I : II I M : I | | | | : | | : : : 

Db 8 95 CQAGYTGRNCESDIDDCRPNPCHNGGSCTDGVNAAFCDCLPGFQGAFCEEDINECATNPC 954 

Qy 311 CPVGTYGVLCAET CQCVNGGKCYHVSG ACLCEAG 34 4 

I I I I : I I I I I I I I I I I I 

Db 955 QNGANCTDCVDSYTCTCPTGFNGIHCENNTPDCTESSCFNGGTC — VDGINS FTCLCPPG 1012 

Qy 34 5 FAGERCEARLCPEGLYGI-KCDKRCPCHLENTHSCHPMSG-- -ECACKPGWSGLYCNE 39 8 

Ml: I :: I I I I I II I : I I I : : I I I 

Db 1013 FTGSYCQ YDVNECDSR-PCLHGGT — CQDSYGTYKCTCPQGYTGLNCQNLVR 1061 

Qy 399 --TCSPGFYGEACQQI CSCQN GADCDSVTGKCTCAPGFKGIDCSTPCPLGTY 448 

: I III ||:: I : I I : : I I : I I I : I I 

Db 10 62 WCDSAPCKNGGKCWQTNTQYHCECRSGWTGFNCDVLSVSCEVAAQKRGIDVTLLCQHGGL 1121 

Qy 44 9 GIN CSSRCG CKNDAVCSPVDG--SCTCKAGWHGVDCS 483 

: : | : I hi I h I I I I I I : I I : I I 

Db 1122 CVDEEDKHYCHCQAGYTGSYCEDEVDECSPNPCQNGATCTDYLGGFSCKCVAGYHGSNCS 1181 

Qy 484 IRCPSGTWGFGCNLT C QCLNG 504 

I I I I I I : I = I I 

Db 1182 EEINECLSQPCQNGGTCIDLTNTYKCSCPRGTQGVHCEINVDDCHPPLDPASRSPKCFNN 1241 



Qy 



50 5 GACNTLDG--TCTCAPGWRGEKCE- 



LPCQD-GTYGLNCAERCDCSHADGCHPT 553 



II I I I I I I I : I I : I I M II I I : I : 
Db 1242 GTCVDQVGGYTCTCPPGFVGERCEGDVNECLSNPCDPRGTQ — NCVQRVN 12 89 

Qy 554 TGHCRCLPGWSGVHCDSVCAEGRWGPNC 581 

III I : I I : I I f I I 
Db 1290 DFHCECRAGHTGRRCESV-INGCRGKPC 1316 



Search completed: March 26, 2004, 16:09:53 
Job time : 14.8459 sees 



